UNCLASSIFIED 


1j  REPORT  SECURITY  CLASSIFICATION 

UNCLASSIFIED 


2»  SECURITY  CLASSIFICATION  AUTHORITY 


2b  DECLASSIFICATION  /  OOWNGRAOING  SCHEDULE 


4  PERFORMING  ORGANIZATION  REPORT  NUMBER(S) 


REPORT  DOCUMENTS 


lb.  RES...., 


AD-A283  825 


3  DISTRIBUTION /AVAILABILITY  OF  REPORT 

Approved  for  public  release ; 
distribution  unlimited. 


S  MONITORING  ORGANIZATION  REPORT  NUMBER(S) 


64  NAME  OF  PERFORMING  ORGANIZATION 


6c  ADDRESS  (Oty.  Staf.  and  iiPCodt) 


84.  NAME  OF  FUNDING /SPONSORING 
ORGANIZATION 

ONR/ONT  .  ARL,  VA  22217 


6b  OFFICE  SYMBOL 
(If  spplksbt^) 


Bb  OFFICE  SYMBOL 

9  PROCUREMENT  INSTRUMENT  ID 

r;Tr.rj 

TION  N 

(If  apfillcabla) 

i 

8c  ADDRESS  (City.  State,  and  7IP  Coda) 

1  10  SOURCE  OF  FUNDING  NUMBERS^-  | 

PROGRAM 

PROJECT 

TASK 

WORK  UNIT 

ELEMENT  NO 

NO 

NO. 

ACCESSION  NO. 

M  title  Oncludt  security  CUssifkstion) 

Eigenvalue  Tests  and  Distributions  for  Small  Sample  Order  Determination  for  Complex 
Wishart  Matrices  (U) 


12  PERSONAL  AUTH0R<S) 


134  type  of  report 
Final 


'6  SUPPLEMENTARY  NOTATION 


Curtis  Irvin  Caldwell 


13b  TIME  COVERED  , 
FROM  W  TO  199^ 


OF  REPORT  Ktar.  Month.  Day)  hS  PAGE  COUNT 
August  Ij  I  9^1 


COSATI  COOES 


GROUP  I  SUB-GROUP 


18.  SUBJECT  TERMS  (Continu*  on  ravarsa  if  nataaary  and  idantify  by  block  numbar) 
mathematical  statistics,  probability,  hypothesis  tests, 
complex  variables,  principal  components  analysis,  complex 
Wishart  distribution,  eigen- structure  signal  processing. 


19  abstract  (Conttnua  on  ravana  if  nacasury  and  idantify  by  block  numbar) 

This  thesis  looks  at  tests  to  determine  how  many  signal  sources  exist  in  the  medium  when 
constrained  to  using  only  a  few  samples.  It  applies  classical  hypothesis  testing  assuming 
complex  multivariate  Gaussian  random  variables.  Ihe  critical  issue  is  the  derivation  of 
probability  density  functions  of  appropriate  test  statistics. 

This  thesis  includes  a  comprehensive  development  of  the  tools  of  statistics  of  complex 
variables  for  engineers  and  physicists.  Ihis  includes  complex  matrix  derivatives,  changes 
of  complex  variables,  and  pxro parties  of  the  chEiracteristic  function  of  a  complex  multi¬ 
variate  random  variable. 

Probability  density  functions  are  derived  fori  the  set  of  eigenvalues  satisfying  the 
generalized  eigenvalue  problem  of  two  complex  Wishart  matrices ,  the  matrix  complex  Normal 
distribution,  a  joint  distribution  needed  to  derive  the  density  for  the  sphericity  test 
statistic,  ratio  of  averages  of  disjoint  sums  of  sequential  eigenvalues  of  a  complex 
Wishart  matrix,  and  several  tests  based  on  the  ratio  of  an  arbitrary  eigenvalue  to  the 


20  DISTRIBUTION /AVAILABILITY  OF  ABSTRACT 
S^UNCLASSIFIEDAJNLIMITEO  □  SAME  AS  RPT 


224  NAME  OF  RESPONSIBLE  INDIVIDUAL 

Curtis  Irvin  Caldwell 


21,  ABSTRAa  SECURITY  CLASSIFICATION 
O  OTIC  USERS  UNCLASSIFIED 


Ar*4C<x3*)  22e  OFFICE  SYMBOL 


DO  FORM  1473, 84  MAR 

TynCQlJ-^*' 


83  APR  edition  mey  be  used  until  exhausted 
^  All  other  editions  are  obsolete 


SECURITY  CLASSIFICATION  OF  THIS  PAGE 


UNCLASSIFIED 


UNCLASSIFIED 


IKCUWTY  CLAMiriCATlOW  Of  TMI» 

18. (Continued)  characteristic  functions  of  multivariate  complex  variables,  changes  of 
multivariate  complex  variables,  complex  matrix  derivatives,  exterior  products,  wedge 
products,  complex  linear  algebra,  Lebesgue-Eadon-Nikodym  theorem,  zonal  polynomials  of 
complex  matrix  argument,  eigenvalues,  eigenvectors,  sphericity  test,  beamforming, 
acoustics,  array  signal  processings _ 

19  (Continued)  maximum,  minimum,  average,  or  sum  of  all  the  eigenvalues  for  a  special  case 
of  the  complex  Wishart  matrix. 

This  thesis  also  contains  a  few  minor  results  regarding  zonal  polynomials  of  complex 
matrix  argument,  and  a  tutorial  on  the  Lebesgue-Radon-Nikodym  theorem  use  for  estimation. 


UNCLASSIFIED 


SECURITY  CLASSIFICATION  OF  THIS  FACE 


The  Pennsylvania  State  University 
The  Graduate  School 


The  Graduate  Program  in  Acoustics 


EIGENVALUE  TESTS  AND  DISTRIBUTIONS  FOR 
SMALL  SAMPLE  ORDER  DETERMINATION  FOR 
COMPLEX  WISHART  MATRICES 


A  Thesis  in 
Acoustics 


by 

Curtis  Irvin  Caldwell 


(c)  1993  Curtis  Irvin  Caldwell 


Submitted  in  Partial  Fulfillment 
of  the  Requirements 
for  the  Degree  of 


Accesion  For  j 

NTIS  CRA&I 

DTIC  TAB 

Unannounced 

Justification 

□ 

By 

Distrib 

ution/ 

Availability  Codes 

Dist 

u 

Avail 

Spe 

Jnd/or 

cial 

( 

Doctor  of  Philosophy 


94-27558  M 

llill  mil  H*M  WMI  _ _  Ci*.  ' 


I 


August  1994 


<’  94  S  26 


We  approve  the  thesis  of  Curtis  Irvin  Caldwell 


Date  of  Signature 


Leon  H.  Sibul 

Senior  Scientist  and  Professor  of  Acoustics 
Thesis  Adviser 
Chair  of  Committee 


Carter  L.  Ackerman 

Associate  Professor  of  Engineering  Research 


Sabih  I.  Hayek 
Distinguished  Professor 
of  Engineering  Mechanics 


Diana  F.  McCammon 
Senior  Research  Associate 
Associate  Professor  of  Acoustics 

_ 

Jiri  Tichy 

United  Technologies  Professor  of  Acoustics 
Chair  of  the  Graduate  Program  in  Acoustics 


<3^ 


iii 

ABSTRACT 

This  is  a  theoretical  thesis.  The  goal  is  to  determine  how 
many  signal  sources  exist  in  the  medium  when  constrained  to  using 
only  a  few  samples.  The  need  to  make  decisions  based  on  only 
a  few  samples  is  motivated  by  the  slow  sound  propagation  speed 
and  the  time  urgency  to  make  decisions.  This  research  treats 
the  problem  from  the  point  of  view  of  classical  hypothesis  testing 
assuming  complex  multivariate  Gaussian  random  variables.  This  is 
the  small  sample  complex  principal  components  analysis  problem. 
The  critical  issue  is  the  derivation  of  probability  density  functions 
of  appropriate  test  statistics.  The  goal  has  been  partially  achieved. 

The  probability  density  functions  for  several  important  dis¬ 
tributions  have  been  derived.  In  particular,  these  include  the 
distribution  for  the  set  of  eigenvalues  satisfying  the  generalized 
eigenvalue  problem  of  two  complex  Wishart  matrices,  the  matrix 
complex  Gaussian  distribution,  a  joint  distribution  needed  to  derive 
the  density  for  the  sphericity  test  statistic,  the  density  function 
for  the  ratio  of  averages  of  disjoint  sums  of  sequential  eigenvalues 
of  a  complex  Wishart  matrix,  and  several  tests  based  on  the  ratio 
of  an  arbitrary  eigenvalue  to  the  maximum,  minimum,  average,  or 


IV 


sum  of  all  the  eigenvalues  for  a  special  case  of  the  complex  Wishart 
matrix.  This  thesis  includes  a  derivation  completely  in  the  context 
of  complex  variables  of  the  density  function  of  the  complex  Wishart 
distribution  and  the  distribution  of  its  eigenvalues.  It  also  includes 
a  few  minor  results  regarding  zonal  polynomials  of  complex  matrix 
argument. 

A  comprehensive  development  of  the  tools  of  statistics  of 
complex  variables  for  engineers  and  physicists  is  provided.  This 
includes  a  study  of  complex  matrix  derivatives,  changes  of  complex 
variables,  and  properties  of  the  characteristic  function  of  a  com¬ 
plex  multivariate  random  variable.  A  derivation  of  the  complex 
Hotelling’s  T^  test  statistic  and  distribution  useful  for  tests  on 
means  is  given.  A  tutorial  on  Kiefer  and  Wolfowitz’  application  of 
the  Lebesgue-Radon-Nikodym  theorem  for  the  estimation  approach 
is  provided. 
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INTRODUCTION 

1.1  Characterization  of  Thesis 

1.1.1  Focus  of  Thesis 

The  focus  of  this  thesis  is  the  development  of  tools  and  construction  of  meth¬ 
ods  for  determining  the  number  of  point  sources  present  in  a  measured  acoustic 
field.  There  are  several  good  approaches  to  this  problem.  The  approach  ex¬ 
amined  in  this  thesis  is  that  of  Principal  Components  Analysis  for  the  small 
sample  case  of  signals  and  noise  arising  from  the  matrix  complex  normal  prob¬ 
ability  distribution.  This  distributional  assumption  is  a  typical  starting  point 
for  problems  in  array  processing.  The  forms  of  test  statistics  applicable  to  this 
problem  have  been  known  by  many  people  for  a  long  time.  The  hard  part  of 
the  problem  is  obtaining  the  sampling  distributions  of  those  statistics.  The 
distributions  for  test  statistics  have  been  developed  in  this  thesis  for  some  of 
the  simple  (and,  hence,  unrealistic)  cases.  Although  there  remains  much  work 
to  be  done,  this  thesis  does  develop  significant  tools  required  for  the  further 
study  of  this  problem  and  it  partially  develops  the  derivations  of  the  ultimately 


desired  distributions. 


1.1.2  Discipline  Home  of  Thesis 
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A  major  criticism  levied  against  this  thesis  is  the  notion  that  it  is  not  a  thesis  in 
acoustics.  It  is  true  that  most  of  the  work  produced  in  the  course  of  this  study 
does  not  have  the  flavor  most  acousticians  would  recognize,  yet  it  was  originally 
(and  still  is)  solidly  motivated  by  a  problem  in  acoustics.  Because  the  terminal 
goal  of  this  research  has  not  been  reached,  it  is  not  yet  possible  to  demonstrate 
its  application  via  experiment  or  simulation  to  acoustics.  However,  because 
of  the  research  accomplishments  of  this  thesis,  the  day  when  that  might  be 
possible  is  now  closer  (in  event  time  measure). 

The  bulk  content  of  this  thesis  is  multivariate  statistics  of  complex  vari¬ 
ables.  Statisticians  generally  would  not  claim  this  work  because  of  the  exten¬ 
sive  use  of  complex  variables.  The  most  difficult  contributions  of  this  thesis 
are  grounded  in  topological  group  representation  theory,  yet  mathematicians 
would  not  generally  claim  this  work  because  it  is  too  applied.  Nevertheless,  the 
key  observation  in  this  thesis  (the  justification  of  Gross  and  Richards’  splitting 
theorem  for  zonal  polynomials  of  two  complex  Hermitian  matrix  arguments, 
and  its  application  to  the  derivation  of  the  joint  probability  density  of  eigen¬ 
values  of  an  Hermitian  Wishart  matrix)  requires  such  a  treatment  to  establish 
it.  It  is  appropriate  to  remark  here  that  the  most  widely  useful  results  of  this 
thesis,  which  include  the  systematic  redaction  of  the  linear  algebra,  differential 
and  integral  calculus,  and  statistics  of  complex  variables,  is  accessible  to  most 
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engineering  juniors. 

Although  signal  processing  most  often  finds  its  academic  home  in  electrical 
engineering,  electrical  engineers  most  often  deal  with  applications  which  make 
use  of  large  sample  sizes.  I  am  interested  in  the  small  sample  size  case.  Further, 
the  use  of  the  exterior  product  in  developing  Jacobians  for  changes  of  complex 
variables  is  uncommon  among  electrical  engineers.  Signal  processing  is  most 
properly  classified  as  an  information  science,  and  is  quite  independent  of  the 
use  of  electrons  to  implement  its  ideas.  Another  difference  is  that  the  speed  of 
acoustic  signals  is  significantly  slower  than  for  the  case  of  electromagnetically 
propagated  signals.  So,  this  thesis  must  reside  in  an  interdisciplinary  home. 
With  this  major  impediment  set  aside,  let  us  continue  with  the  description  of 
the  background  and  content  of  the  subject  matter. 

1.1.3  What  This  Thesis  is  Not 

This  thesis  is  not  about: 

•  devising  new  signal  processing  structures 

•  faster  or  more  robust  algorithms 

•  inventing  new  statistical  tests 

•  comparing  old  tests 

•  finding  asymptotic  distributions  of  test  statistics 


•  examining  Cramer-Rao  bounds  for  estimators 

•  assessing  estimator  consistency 

•  simulating  results 

1.2  Order  Estimation 

This  thesis  concentrates  on  the  problem  of  determining  the  number  of  signif¬ 
icant  sources  present  at  an  array  in  a  noisy  environment.  This  is  known  as 
system  order  determination  or  system  identification  in  other  contexts.  More 
correctly,  the  question  being  investigated  is  the  number  of  arrival  paths  con¬ 
taining  signals  that  can  be  distinguished  from  noise.  Often,  the  question  is 
asked  for  a  fixed  frequency. 

Several  studies  in  signal  processing  assume  that  system  order  is  given  or  can 
be  obtained.  One  important  work  is  the  introduction  of  the  Multiple  Signal 
Classification  (MUSIC)  algorithm  by  Schmidt  [238].  He  requires  knowledge 
of  the  number  of  eigenvalues  of  the  received  data  matrix  that  are  associated 
with  noise.  Another  study  that  requires  knowledge  of  the  number  of  received 
signals  is  the  thesis  on  maximum  likelihood  estimation  by  Mirkin  [183]  (p. 
37).  In  Tague’s  study  [263]  (p.  140)  of  stochastic  operators  and  their  ma¬ 
trix  representations  applied  to  estimator-correlator  processors,  he  examined 
the  relationship  between  system  identification  and  receiver  performance.  He 
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showed  that  perfect  identification  is  not  required  in  order  to  improve  processor 
gain,  and  that  even  poor  identification  conducted  at  a  low  signal-to-noise  ratio 
(SNR)  results  in  some  improvement.  The  approach  of  this  thesis  relies  on  the 
testing  of  eigenvalues. 


1 . 3  Eigensolut  ions 

The  number  of  arrival  paths  is  related  to  the  array  element  output  covariance 
matrix  eigenvalues  and  is  independent  of  array  geometry.  I  assume  in  this 
thesis  that  the  array  is  unstructured.  The  eigenvectors  of  this  covariance 
matrix  is  a  function  of  array  geometry  and  the  directions  of  arrival. 

Morrison  [186]  has  a  wonderful  discussion  on  the  geometric  interpretation 
of  eigenvalues  and  eigenvectors  in  his  discussion  on  principal  components.  The 
eigenvectors  define  a  coordinate  system.  The  eigenvector  associated  with  the 
largest  eigenvalue  defines  that  linear  combination  of  data  that  produces  the 
maximum  variance  in  the  data.  The  eigenvector  associated  with  the  second 
largest  eigenvalue  defines  that  linear  combination  of  data  that  produces  maxi¬ 
mum  variance  subject  to  the  restriction  that  the  second  eigenvector  is  orthog¬ 
onal  to  the  first  eigenvector.  Successive  axes  are  defined  in  the  same  way.  The 
sample  eigenvalues  are  the  variance  estimates  of  the  linear  combinations  of 
the  data  defined  by  the  associated  eigenvectors.  When  the  eigenvectors  are 
normalized  to  unit  length,  they  can  be  thought  of  eis  direction  cosines  which 
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specify  the  rotation  from  the  original  response  axes  of  the  data  to  the  axes 
given  by  the  set  of  eigenvectors. 

To  understand  the  effect  of  eigenvalue  separation  on  the  accuracy  of  di¬ 
rections  of  arrival  computed  from  associated  eigenvectors,  consider  the  eigen¬ 
vectors  as  being  axes  of  an  n-dimensional  ellipsoid.  Think  of  the  square  root 
of  the  eigenvalues  (the  singular  values)  as  being  the  lengths  of  the  semi-axes. 
Now,  visualize  an  ellipsoidal  shell  conforming  to  this  geometry.  The  sharp¬ 
ness  of  curvature  of  the  ellipsoidal  shell  can  be  thought  of  as  a  measure  of  the 
stability  of  the  direction-of-arrival  estimate  or  bearing  accuracy. 

If  all  the  eigenvalues  are  equal,  you  have  a  ball!  Hence,  a  test  for  equality 
of  eigenvalues  is  often  called  a  sphericity  test.  There  are  an  infinite  number 
of  possible  3-dimensional  orthogonal  coordinate  systems  that  you  can  fit  to 
a  3-dimensional  sphere.  Assuming  that  the  origin  of  all  coordinate  systems 
is  at  the  center  of  this  sphere,  the  first  choice  is  an  arbitrary  point  on  the 
sphere,  like  the  North  Pole.  The  number  of  choices  is  uncountable.  This 
fixes  the  first  coordinate  (eigenvector).  The  second  coordinate  is  constrained 
to  be  orthogonal  to  the  first,  which  places  the  second  choice  for  a  point  on 
the  sphere’s  equator.  Even  here  the  number  of  choices  is  uncountable.  In 
general,  for  an  n-dimensional  sphere,  the  number  of  axes  for  which  there  are 
uncountable  choices  of  orientations  is  n-1. 

There  will  be  as  many  non-zero  eigenvalues  as  there  are  sources  when 
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there  are  more  sensors  than  sources  and  there  is  no  noise.  If  there  is  noise 
then  all  the  eigenvalues  will  be  nonzero.  The  sensor  outputs  are  random 
variables  and  hence  the  eigenvalues  and  eigenvectors  of  the  sample  covariance 
matrix  are  random  variables.  When  the  signal-to-noise  ratio  is  large,  the  large 
eigenvalues  are  associated  with  the  signal  plus  noise  and  the  small  eigenvalues 
are  associated  with  the  noise.  When  the  signal-to-noise  ratio  is  small,  the 
determination  of  the  exact  number  of  sources  is  not  as  easy.  The  primary 
question  of  this  thesis  is  as  follows. 

Given  two  eigenvalues  [or  groups  of  eigenvalues)  from  a  noisy  process, 
is  the  difference  between  them  due  to  mere  chance, 
or  is  it  more  likely  due  to  some  underlying  real  cause? 

The  sensitivity  of  the  accuracy  of  eigenvectors  eis  a  function  of  (a)  eigen¬ 
value  separation,  (b)  underlying  distribution  determined  by  the  Q-mixing  of 
two  Gaussian  distributions,  and  (c)  covariance  estimation  method  (conven¬ 
tional  sample  covariance  estimation,  rank  correlation,  weighted  M-estimate) 
was  the  subject  of  a  simulation  study  by  Moghaddamjoo  [184].  The  concept  of 
a-mixing  refers  to  the  convex  sum  of  two  or  more  probability  distributions.  For 
the  simple  two-distribution  case,  one  of  the  distributions  can  be  called  a  con¬ 
taminating  distribution.  Conventional  estimation  was  best  when  there  was  no 
Q-mixing.  The  rank-correlation  (robust)  method  was  best  when  the  contami¬ 
nation  factor  was  0.1.  The  weighted  M-estimate  never  was  best.  These  results 


were  observed  at  all  signal-to-noise  ratios.  As  expected  from  the  geometri¬ 
cal  interpretation,  when  the  signal-related  eigenvalues  were  not  well  separated 
from  each  other  or  from  noise,  then  the  estimates  of  the  related  eigenvectors 
were  very  different  from  their  true  values.  As  long  as  good  eigenvalue  separa¬ 
tion  existed,  then  the  space  spanned  by  the  estimated  signal  eigenvectors  was 
almost  the  same  as  the  true  signal  space.  When  a  signal  related  eigenvalue 
was  close  to  the  noise  eigenvalues,  there  was  significant  mixing  between  its  cor¬ 
responding  eigenvector  and  noise  related  eigenvectors.  The  only  remedy  was 
to  increase  the  overall  array  signal-to-noise  ratio  by  increasing  the  number  of 
sensors  and  filtering  the  noise  as  much  as  possible. 

The  problem  reduces  to  looking  at  the  sample  eigenvalues  to  test  if  the 
corresponding  population  eigenvalues  are  the  same  or  significantly  different. 
More  generally,  the  hypothesis  I  would  like  to  test  is  H  :  cfA^Ci  = 
versus  the  alternative  A  :  cf  A^ci  >  Cj  A*C2  where  ci  and  C2  are  column  vectors 
of  real  numbers  that  specifies  linear  combinations  of  eigenvalues  contained  in 
the  diagonal  matrix  A^.  This  is  equivalent  to  a  test  proposed  by  Krishnaiah 
and  Lee  [153]  without  providing  an  expression  for  the  distribution  involved. 
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1.4  Major  Assumptions  and  Rationale  for  Ap¬ 
proach 

Since  an  eigenvalue  is  the  square  of  its  related  singular  value,  we  can  test 
the  square  of  the  sample  singular  values  to  determine  the  appropriate  rank  of 
an  approximating  covariance  matrix  for  an  eigensystem  processor  [212].  This 
rajik  is  known  as  the  system  order.  Once  known,  beams  can  be  formed  to 
maximize  the  signal-to-noise  ratio  in  the  desired  look-directions  by  cancelling 
out  the  interfering  point  sources  using  methods  described  in  Monzingo  and 
Miller  [185].  The  mathematics  for  optimal  processing  has  been  worked  out 
when  the  system  order  is  known.  Progress  in  the  development  of  statistical 
estimation  techniques  that  apply  to  this  problem  is  still  being  made.  For 
example,  see  the  fascinating  thesis  by  Kundu  [158].  The  hypothesis  testing 
approach  has  received  little  attention. 

The  order  estimation  problem  can  be  approached  from  a  strategy  of  estima¬ 
tion  or  a  strategy  of  hypothesis  testing.  If  you  choose  an  estimation  strategy, 
you  must  know  how  good  your  answer  is.  A  confidence  level  (1  —  a)  must 
be  chosen  to  form  a  confidence  interval.  If  you  choose  hypothesis  testing,  the 
size  a  of  the  test  must  be  chosen  to  construct  the  critical  value  against  which 


the  test  statistic  is  compared.  In  both  strategies  the  choice  of  a  is  subjective, 
whether  the  choice  is  made  directly  or  indirectly,  such  as  via  cost  and  utility 
functions.  Regardless  of  your  strategy,  you  can  construct  a  better  an  esti¬ 
mator  or  hypothesis  test  if  you  know  more  about  the  distributions  involved. 
To  even  assume  that  data  is  drawn  from  an  exponential  family  distribution  is 
subjective,  even  when  the  hypothesis  of  such  an  event  is  not  rejected  by  test¬ 
ing.  Explicitly  identified  subjectivity  is  not  necessarily  bad.  It  enables  us  to 
build  tractable  models  and  efficiently  achieve  reasonable  results.  The  charge 
of  “subjectivity”  lodged  against  hypothesis  testing  by  proponents  of  estima¬ 
tion  is  an  invalid  defense  of  estimation  and  an  invalid  claim  of  advantage  of 
estimation  over  testing.  Estimation  and  testing  both  require  a  choice  of  a 
for  the  results  to  be  meaningful  and  thus  are  based  on  the  same  underlying 
theory.  Both  are  worthy  candidates  for  investigation  and  development.  The 
advantage  of  estimation  over  hypothesis  testing  is  that  less  work  is  usually 
involved  in  obtaining  an  answer.  The  usefulness  of  the  answer,  however,  can 
only  be  assessed  by  assuming  a  value  for  a  and  applying  distributional  theory. 

One  characteristic  of  acoustic  signal  processing  that  distinguishes  it  from 
processing  electromagnetic  signals  is  the  comparatively  slow  propagation  speed 
of  acoustic  signals.  In  radar,  if  you  need  more  independent  samples  to  satisfy 
applicability  of  the  central  limit  theorem,  you  increase  your  pulse  repetition 
rate.  In  acoustics,  the  speed  at  which  data  is  propagated  is  slow  compared  to 
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the  speed  of  light.  This  means  that  decisions  must  be  based  on  a  restricted 
number  of  independent  samples  available  per  unit  time.  This  drives  interest 
to  the  small  sample  case.  The  desirability  of  working  with  a  small  sample  size 
distinguishes  this  problem  in  acoustics  from  one  in  electrical  engineering  which 
is  usually  satisfied  by  the  large  sample  case. 

The  desire  to  work  with  corr~'lex  i\  dom  variables  distinguishes  the  work 
in  this  thesis  from  work  that  miguc  usually  be  found  in  statistics.  Bandpass 
acoustic  data  is  naturally  represented  with  complex  numbers.  The  primary 
interest  in  using  complex  variables  in  the  development  of  theory  is  the  natural 
and  convenient  representation  of  the  time-dependency  of  physical  variables  by 
using  the  form  exp{iu)t).  By  applying  the  Hilbert  transform  to  the  array  ele¬ 
ment  data,  the  resulting  data  stream  can  be  represented  as  complex  numbers. 
Application  to  actual  data  allows  us  to  efficiently  do  phase  comparisons  and 
computations.  A  very  nice  discussion  in  the  sonar  context  is  in  Ziomek’s  1985 
book  [299]  (pp.  176-189).  Let  our  real  data  stream  be  the  variable  x{t)  and 
let  the  Hilbert  transform  of  x{t)  be  y{t).  The  usual  notation  for  the  Hilbert 
transform  of  x{t)  is  x{t).  You  can  think  of  the  Hilbert  transform  as  being  a 
quadrature  filter  having  x{t)  as  its  input.  Then  our  complex  data  stream  is 
formed  by  z{t)  =  x{t)  +  iy{t). 

A  common  assumption  for  purposes  of  mathematical  simplicity  when  first 
developing  theory  for  an  application  in  signal  processing  is  that  the  process 
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is  stationary.  Application  to  array  processing  leads  to  consideration  of  com¬ 
plex  multivariate  distributions.  The  assumption  of  Gaussian  white  noise  is  a 
traditional  starting  point  in  signal  processing  studies  because  it  simplifies  the 
mathematics  involved  and  it  is  not  a  bad  model  for  a  wide  range  of  situations. 

Wooding  [293]  is  often  cited  as  the  beginning  point  for  the  work  with  com¬ 
plex  normal  random  variables  because  he  connected  it  with  application  to  the 
envelope  of  a  random  noise  signal.  He  considered  the  form  of  the  covariance 
matrix  and  density  function  of  the  random  variable  z„(t)  =  x„{t)+iyn{t)  where 
x„  and  j/„  are  independent  normal  random  variables.  Thus,  for  the  complex 
scalar  Zn(t),  the  real  and  imaginary  parts,  x„(t)  and  yn{t),  are  uncorrelated. 
He  showed  that  the  covariance  matrix  for  the  real  and  imaginary  parts  of  two 
such  complex  normal  random  variables,  Zm  and  z„,  satisfied  the  following  con¬ 
ditions;  E  {ymyn}  =  E  and  E  {x^j/n}  =  —E  {x„j/„,}.  He  derived  the 

density  function  and  the  characteristic  function  of  the  vector  complex  nor¬ 
mal  distribution  for  the  zero  mean  case.  Goodman  [92]  (p.  173),  a  pioneer 
in  the  study  of  complex  Gaussian  statistics,  remarked  that  many  stationary 
non-Gaussian  processes  become  nearly  Gaussian  when  “passed  through”  suf¬ 
ficiently  narrowband  filters.  Bendat  and  Piersol  [39]  provide  a  cautionary 
remark  that  physical  phenomena  and  measured  data  ultimately  are  limited  by 
nonlinear  restraints  in  the  positive  and  negative  direction,  so  no  random  data 
can  be  truly  Gaussian.  Therefore  the  Gaussian  distribution  is  not  appropriate 
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for  looking  at  extreme  events,  which  are  events  located  in  the  tails  of  the  dis¬ 
tribution.  This  is  precisely  where  our  interest  lies  for  the  detection  problem, 
and  I  will  conveniently  ignore  their  wise  cautionary  remark  under  the  rubric 
that  one  should  understand  what  is  easy  before  trying  to  understand  what  is 
hard.  Attention  is  focused  on  the  complex  multivariate  normal  and  related 
distributions.  The  complex  Wishart  distribution  is  the  natural  distribution  for 
examining  the  variability  of  a  sample  spectral  density  matrix. 

This  thesis  focuses  on  hypothesis  testing  strategies. 

The  problem  is  examined  in  the  context  of  a  complex  variable  small  sample 
principal  components  analysis  problem. 


1.5  Organization  of  Thesis 

This  thesis  is  organized  as  follows.  The  chapters  contain  the  materials  which 
I  judged  are  mathematically  accessible  to  most  engineers  and  are  most  di¬ 
rectly  related  to  the  hypothesis  testing  question.  The  appendices  contain  the 
supporting  mathematical  background  or  results  which  I  judged  not  commonly 
accessible  to  most  engineers. 

Chapter  2  provides  a  mathematical  statement  of  the  problem  as  one  of 
a  small  sample  complex  principal  components  test.  Chapter  3  reviews  other 
applications  that  can  benefit  from  eigenvalue  tests.  Chapter  4  identifies  ap- 
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proaches  to  the  order  estimation  problem  different  than  the  one  taken  in  this 
thesis.  It  also  includes  an  exposition  of  Kiefer  and  Wolfowitz  [140]  generaliza¬ 
tions  of  maximum  likelihood  estimators.  This  discussion  provides  an  abstract 
setting  within  which  the  process  of  model  order  identification  and  estimation 
can  be  viewed  as  part  of  the  same  problem  of  selecting  one  or  a  family  of  prob¬ 
ability  measures  from  among  candidates.  Chapter  5  reviews  previous  work  on 
order  determination  by  hypothesis  testing.  Chapter  6  specifies  some  statistical 
tests  of  interest.  Chapter  7  contains  the  summary  and  conclusions.  Chapter 
8  contains  reconunendations  for  further  research. 

The  first  appendix  highlights  the  mathematical  background  necessary  for 
this  thesis.  It  identifies  good  preparatory  references  and  gives  examples  that 
illustrate  the  need  for  the  special  care  and  attention  to  details.  It  also  outlines 
the  major  structure  of  the  three  groups  of  appendices.  The  last  appendix 
identifies  notation  conventions  and  defines  special  symbols  and  functions.  It 
is  located  at  the  very  end  to  make  it  easy  to  use. 

The  appendices  are  perhaps  the  most  valuable  pail  of  this  thesis.  They 
lay  the  groundwork  to  support  many  other  efforts.  The  experienced  reader 
will  have  used  many  of  these  results,  having  found  them  in  isolated  literature, 
or  will  have  independently  developed  the  results.  I  know  of  no  systematic 
thorough  presentation  of  these  results  explicitly  for  the  complex  case.  Perhaps 
the  closest  to  achieving  this  is  the  fine  text  by  Stewart  [259].  Consequently,  I 
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have  taken  the  liberty  of  developing  results  related  to  my  general  theme  even 
when  they  do  not  follow  the  very  narrow  line  of  reasoning  expected  in  a  law 
court  to  explore  the  stated  thesis  topic.  This  development  was  a  labor  of  love 
initially  patterned  by  Chapter  17  of  the  wonderful  text  by  Arnold  [31].  It 
expanded  to  include  work  derived  in  great  measure  by  Muirhead  [187]  and 
Anderson  [26].  These  appendices  are  not  in  natural  pedagogical  order,  but 
rather  are  grouped  by  my  anticipation  of  which  material  would  be  useful  to 
different  kinds  of  readers. 

Appendices  A  through  F  are  accessible  to  most  engineers  and  are  directly 
related  to  this  thesis.  If  this  thesis  is  ever  read,  I  expect  that  this  group 
of  appendices  to  be  of  the  most  use  to  other  people.  Those  who  insist  on 
practical  results  can  find  some  in  the  wonderful  work  by  Tague  [264],  which 
is  presented  here  with  some  steps  that  were  omitted  in  his  journal  article  due 
to  lack  of  space.  Appendices  G  through  J  are  at  a  more  abstract  level.  The 
most  challenging  contributions  made  in  this  thesis  are  given  in  equation  G.IO, 
material  related  to  equation  G.16,  and  theorem  98,  all  contained  in  appendix 
G.  All  of  appendices  G  through  J  are  necessary  for  complete  understanding  of 
this  thesis.  Much  of  it  is  not  new  knowledge,  but  is  included  to  allow  engineers 
to  get  access  to  the  necessary  mathematical  background  quickly.  Appendices 
K  through  P  form  a  repository  of  results  that  are  mundane,  useful  (for  the 
most  part),  and  are  not  generally  available  elsewhere.  Other  than  Appendices 
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H.l  through  H.5  and  I,  the  appendices  are  results  which  I  have  recast  from 
real  variables  into  the  complex  variables  case,  or  are  results  I  have  not  seen 
elsewhere  even  for  the  real  variables  case  (yet).  The  most  interesting  results  in 
this  group  of  appendices  are  in  appendix  L,  and  the  easy  results  that  were  fun 
to  produce  are  in  appendix  N.  The  most  important  of  this  group  of  appendices 
is  appendix  M,  and  the  most  difficult  to  produce  was  appendix  P. 


Chapter  2 


MATHEMATICAL  STATEMENT  OF 

PROBLEM 

2.1  Introduction 

In  this  chapter,  I  provide  a  mathematical  statement  of  the  problem  and  test 
statistics  known  to  apply  to  the  problem.  In  a  later  chapter,  you  will  observe 
I  have  also  included  a  few  other  statistics  applicable  to  the  order  identification 
problem. 

The  basic  mathematical  problem  can  be  stated  as  follows.  Assume  that 
we  have  m  arbitrarily  oriented  sensors  and  p  sources.  In  particular,  I  am  not 
restricting  this  to  a  study  requiring  a  linear  array.  We  know  m  and  we  want 
to  find  p.  The  value  of  m  is  selected  with  the  intention  that  the  assumption 
m  >  p  is  valid.  Assume  the  Gaussian  white  noise  is  isotropic  and  independent 
of  the  signals  and  that  the  signals  are  mutually  independent.  We  want  the 
difference  between  a  signal  at  various  sensors  to  depend  only  on  the  time 
difference  due  to  propagation  between  the  source  and  the  sensors.  Therefore, 
accept  the  linearized  equations  for  small  amplitude  acoustics  and  assume  that 
the  sensors  are  located  in  the  acoustic  (but  not  necessarily  geometric)  far  field 
of  the  sources.  I  do  not  require  an  assumption  of  plane  wave  propagation 
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across  the  array.  Those  are  issues  related  to  beamformer  assumptions  which 
are  not  within  the  scope  of  this  thesis.  The  geometry  is  illustrated  in  figure 
2.1. 
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Figure  2.1.  Array  Geometry 

If  the  array  is  sampled  n  times,  where  n  >  m,  then  we  get  the  following 
matrix.  Each  Xi{k)  is  a  complex  random  variable. 


X  = 


Xai)  X2{1) 
Xii2)  X2{2) 


Xm{l) 

Xrr,{2) 


Xi{n)  X2in)  ••• 


(2.1) 


Regardless 


origin  of  the  elements  of  matrix  X,  we  can  determine 
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the  rank  of  X  by  determining  the  number  of  nonzero  singular  values  of  X. 
Alternately,  we  can  determine  the  rank  of  X  by  examining  the  number  of 
nonzero  eigenvalues  of  either  XX^  or  X^ X.  Independence  of  the  samples  is 
not  required  for  the  singular  values  to  identify  the  rank  of  X.  When  I  finally 
derive  distributional  results  I  will  require  that  the  samples  be  independent  to 
simplify  the  mathematics.  This  will  allow  the  assertion  that  the  covariance 
matrix  is  a  complex  Wishart  matrix.  However,  a  future  development  should 
deal  with  X  without  the  sample  independence  constraint,  perhaps  via  studying 
the  singular  values. 

In  absence  of  noise,  the  rank  of  this  matrix  is  the  number  of  sources.  I 
want  to  find  a  matrix  A  of  lowest  rank  that  is  a  best  approximation  of  X  in 
some  sense.  Then  i/=  rank(A)  is  the  answer.  The  random  variable  i/  is  our 
approximation  to  p.  I  want  to  find  out  what  is  i/.  Suppose  that  the  data  in 
matrix  X  includes  noise.  If  some  matrix  V  consists  of  only  the  noise  data, 
then  we  can  examine  the  rank  of  Z  =  X  —  V.  We  may  examine  the  rank  of 
X  or  Z  directly  by  looking  at  their  singular  values  obtained  from  a  Singular 
Value  Decomposition  (SVD),  or  by  looking  at  their  eigenvalues  obtained  from 
an  Eigenvalue  Decomposition  (EVD)  of  X^X  or  Z^ Z.  Eaton  and  Perlman 
[73]  showed  that  X^X  is  of  full  column  rank  m  with  probability  1.  Okamoto 
[197]  showed  that  the  eigenvalues  of  such  a  matrix  are  all  distinct.  Let  the 
SVD  of  X  be  given  by  Xn,m  =  UPiQ^ ■,  and  let  the  singular  values  of 
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X  be  ordered  according  to  /i  >  I2  >  >  Im-  Let  L  be  the  rectangular 

matrix  containing  the  diagonal  matrix  of  the  singular  values  {/,}  in  its  upper 
left  corner.  The  norm 

min  \\X-Af=lU,+--.  +  t 

p(j4)=t/<m 

is  attained  when  A  =  UPiQ^ ■  The  {P,  }  are  the  left  singular  vectors,  and 

they  satisfy  P  =  /„.  The  {Qi)  are  the  right  singular  vectors,  and  they 
satisfy  Q^Q  =  Im-  The  are  the  non-zero  eigenvalues  of  both  X^X 

and  XX^ .  Let  B  =  X^X.  Define  Bj  =  IjQiQ^  ■  The  matrix  Bj  is  an 

approximation  of  the  matrix  B  formed  with  the  smallest  {m  —  j)  eigenvalues 
and  corresponding  eigenvectors  of  matrix  B.  We  will  see  these  again  in  a 
moment. 

I  essentially  want  to  perform  a  test  for  sphericity  on  the  smallest  m  —  v 
eigenvalues.  We  seek  to  determine  if  they  are  the  same  for  practical  purposes, 
or  if  at  least  one  of  them  is  significantly  different  from  the  others.  Proceed  in  a 
sequential  manner  with  different  values  of  j.  We  want  to  find  out  what’s  v.  The 
order  in  which  you  test  is  your  test  strategy.  The  order  you  choose  depends  on 
your  confidence  in  which  direction  of  testing,  from  small  to  large  eigenvalues  or 
large  to  small  eigenvalues,  will  result  in  a  successful  identification  of  the  rank 
of  the  non-noise  contribution  to  X  with  the  least  amount  of  computational 
work. 

Suppose  you  have  no  signals.  With  probability  1,  no  two  sample  eigen- 


values  will  be  the  same  even  though  there  is  only  one  underlying  population 
eigenvalue.  The  smallest  sample  eigenvalue  will  underestimate  the  common 
population  eigenvalue,  and  the  largest  sample  eigenvalue  from  this  noise-only 
matrix  will  overestimate  the  population  eigenvalue.  This  means  that  if  you 
want  to  estimate  the  smallest  eigenvalue,  you  should  use  an  average  of  the 
sample  eigenvalues  you  have  classified  as  belonging  to  the  same  population 
eigenvalue  rather  than  using  the  smallest  sample  eigenvalue  by  itself.  Doing 
the  latter  would  bias  your  estimate.  The  testing  situation  may  be  different 
because  the  distribution  of  the  sample  eigenvalues  accounts  for  this  problem 
(and  in  fact,  causes  the  problem).  When  testing  a  new  sample  eigenvalue  for 
inclusion  in  a  set  associated  with  underlying  equal  population  eigenvalues,  you 
should  include  in  your  test  as  many  sample  eigenvalues  as  you  have  already 
clcissified  as  being  the  same  population  eigenvalue. 

2.2  Specific  Test  Statistics 

If  you  have  an  array  with  many  sensors  and  an  environment  of  only  a  few 
sources,  then  consider  sequential  tests  of  sphericity  beginning  with  the  full 
matrix.  The  usual  test  for  sphericity  uses  the  maximum  likelihood  ratio  test 
statistic  developed  by  Anderson  [24].  Anderson  determined  the  large  sample 
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(asymptotic)  distribution  of  Ti . 


m 

n  n 


i=t/+l 


(2.2) 


A  form  for  which  a  density  function  might  be  easier  to  derive  is  T2.  This  is 
essentially  equations  (14)  and  (23)  of  Wax,  Shan,  and  Kailath  [279]. 


T2 


n  i? 

t=«/+i 


f.  m 

m— 1/ 

m 

_J_  y  /? 

i=i/+l  J 

(2.3) 


For  the  special  case  of  m  —  j/  =  2,  the  density  function  of  T2  is  given  as 
equation  6.17,  and  the  cumulative  distribution  function  is  given  as  equation 
6.18.  This  is  the  same  as  the  statistic  u  that  Muirhead  [187]  uses  in  the  case 
of  real  variables.  We  will  see  that  equation  6.15  is  very  similar  to  equation  2.3. 

Another  statistic  to  consider  is  T3  or  its  inverse.  This  is  suggested  by  C. 
R.  Rao  [212]  (equations  3.10,  3.11,  and  17.1). 


7,  n+  -li  tr(g) 

The  density  of  T3  can  be  obtained  by  theorem  8.  The  statistic  ^  has  the 
interpretation  as  being  the  fraction  of  the  total  variance  explained  by  those 
eigenvalues  attributed  to  being  influenced  by  the  signals.  Alternatively,  you 
could  test  that  the  last  few  eigenvalues  explain  only  a  small  fraction  of  the 


data  as  in  T4. 


T  tr(^) 

‘  '5«+-  +  a 


(2.5) 
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The  density  of  T4  can  be  obtained  by  theorem  8.  As  a  point  of  convenience, 
note  that  tr(5)  =  tr(X^A^)  =  tr(XJ*f^).  Another  concept  that  is  useful  is 
to  test  if  the  largest  p  eigenvalues  are  significantly  different  than  the  smallest 
m  —  V  eigenvalues  as  in  T5. 


rp  _  +  “  ■  "b 

The  density  of  T5  can  be  obtained  by  theorem  8. 

In  a  real  ocean  environment  with  multipath  propagation,  you  may  want  to 
distinguish  the  direct  (refracted)  path  from  other  paths  using  the  assumption 
that  the  signal-to-noise  ratio  along  the  direct  path  is  greater  than  by  other 
paths.  This  is  a  bit  simplistic,  and  a  more  intelligent  model  could  be  made. 
Then  you  might  want  several  partitions  of  {/< } to  test  on.  To  really  confuse 
the  issue,  you  could  go  back  to  the  sample  covariance  matrix  and  perform 
tests  on  selected  entries  in  that  matrix  to  compare  elements  to  each  other  or 
to  known  constants. 

Let  the  population  eigenvalues  be  denoted  by  A*  =  diag(Aj ,  •  •  • ,  A^).  Let 
a  column  vector  of  real  constants  c  €  be  used  to  construct  linear  combi¬ 
nations  of  the  population  and  sample  eigenvalues.  Let  Cj  be  a  column  vector 
in  Hr'  with  all  zeros  except  for  a  1  in  the  position.  We  will  construct  our 
choices  of  various  vectors  c  using  sums  of  selected  tj.  Construct  a  general  test 
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statistic  Te. 

_  n  \  cf  A»ci  ) 


if 

”  V 

c"A*cj  J 

(2,7) 


Let  the  distribution  of  this  test  statistic  be  dependent  on  n.  Denote  the 
distribution  function  by  /oe  when  c(^A^Ci  =  The  density  of  Te  can  be 

obtained  by  theorem  8. 

Now,  suppose  that  we  want  to  test  if  there  are  exactly  v  sources.  As¬ 
sume  that  it  is  already  established  that  the  last  m  —  u  eigenvalues  are  iden- 

m 

tical.  If  =  A^+i,  then  XI  =  This  leads  to  cj  =  and 

^=1^+1 

C2  =  Cj.  For  this  selection  of  c\  and  C2  we  have  met  the  goal 

m 

of  c^A^ci  =  If  C3  =  53  Cj  we  get  the  relationship  Cj  A^ci  = 

j=t>+i 

j^c^A^Ca  =  c^A^C2  when  AJ  =  Xl^^.  When  this  is  true,  the  test  statis¬ 


tic  becomes  Tr. 


Tj  = 


(m  —  i/)c^L'^ci 


(2.8) 


The  density  of  Tj  can  be  obtained  by  theorem  8.  The  null  hypothesis  is  Hq  : 
(m  —  u)c^ <  f^A^ca.  Written  out  in  terms  of  the  individual  population 
eigenvalues,  this  is  Ho  :  XI  <  A^^j  =  •••  =  A^.  The  alternate  hypothesis  is 
given  by  Ha  :  Xl  >  A^^j  =  •••  =  A^.  If  TV  <  /o7(i-o)>  then  conclude  to  not 
reject  //o;  otherwise  reject  Hq  and  choose  Ha- 

Another  desirable  question  is  to  ask  if  there  are  no  more  than  t/  sources. 
Suppose  you  have  concluded  that  there  are  exactly  v  sources.  Then  the  best 

t/ 

estimator  of  X  is  given  by  X^^j  =  53  UPiQf  ■  What  is  left  over  should  be  due 

tsl 
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to  noise  alone  and  should  therefore  be  spherical.  Let  Y  =  UPiQf  •  The 

i=i/+l 

statistic  for  testing  the  sphericity  of  Y^Y  is  given  by  Tn. 


Tg  =  nln 


(2.9) 


Let  fos{m  —  v,  n)  be  the  distribution  of  the  test  statistic  7g  when  =  •  •  •  A^. 
The  null  hypothesis  is  Hq  :  AJ^j  =  •  •  •  A^.  The  alternate  hypothesis  is  Ha  '• 
one  or  more  of  the  A?  are  different  from  the  rest,  or  equivalently,  not  all  the  A? 
are  equal.  If  Tg  <  /o8(i-a)(”^  —  then  do  not  reject  Ho-  Otherwise,  reject 
i/o  and  conclude  Ha. 

Suppose  that  the  noise  covariance  matrix  R  is  known.  Then  we  want  to 
find  the  rank  of  the  matrix  W  —  B  ~  R.  There  is  a  problem  with  a  direct 
approach  when  all  the  eigenvalues  of  W  are  zero  in  that  such  distribution 
density  functions  become  undefined.  However,  this  is  precisely  what  we  want  to 
look  at.  Alternatively,  let  the  eigenvalue  decomposition  of  VK  be  ly  =  QL^Q^. 
Then,  let  the  eigenvalue  decomposition  of  R  be  given  by  VD^V^ .  We  can  test 
if  the  last  m  —  v  eigenvalues  of  B  equal  the  last  m  —  v  eigenvalues  of  R.  Define 
the  test  statistic  Tg. 


^  ~  c"D2ci 


(2.10) 


The  density  of  Tg  can  be  obtained  by  theorem  8.  Let  /o9(n)  be  the  distribution 
of  Tg  when  Cj^A^cj  =  Dc\  is  true.  Let  =  diag(dj,  •  •  • ,  d^).  Then  the 
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null  hypothesis  is  Ho  :  =  dl  and  the  alternate  hypothesis  is  Ha  :  ^  dl- 

If  /o9(i-f )(«)  ^  Tg  <  /o9(a)(n)  then  do  not  reject  Ho,  otherwise  conclude  that 
Ho  is  rejected  and  therefore  chose  Ha.  When  Ho  is  true,  B  is  not  of  rank  u, 
and  there  are  not  i/  significant  sources.  When  Ho  is  false,  we  reject  Ho,  and 
by  default  choose  Ha,  concluding  that  there  are  i/  significant  sources. 

We  can  not  use  the  sphericity  test  on  W  because  all  the  tested  eigenvalues 
are  zero  under  the  null  hypothesis,  and  the  density  function  of  the  test  dis¬ 
tribution  possibly  will  not  exist.  We  can  test  that  there  arc  no  more  than 
sources  by  comparing  the  sums  of  eigenvalues  of  B  and  the  sums  of  eigenval¬ 
ues  of  R.  Assume  that  there  are  no  more  than  sources.  In  practice,  this 
should  not  be  a  problem  for  the  proposed  test.  The  null  hypothesis  is  given 
by  Ho  :  4-  •  •  •  +  A^  =  dj+i  4-  •  •  •  -I-  and  the  alternate  hypothesis  by 

m 

Ha  :  equality  does  not  hold.  For  this  problem,  let  ci  =  ^nd  compute 

»=*/+! 

test  statistic  Tg.  If  /o9(i -<»)(«)  <  Tg  then  do  not  reject  Ho,  otherwise  reject  Ho 
and  conclude  Ha-  We  are  not  looking  at  A^.  If  the  last  m  —  u  eigenvalues  of  B 
equal  those  of  R,  then  the  rank  of  W  is  less  than  j/  4-  1.  Therefore,  the  rank 
of  W  is  no  more  than  i/. 

Suppose  now  that  you  do  not  know  R,  but  you  have  an  estimate  of  R  which 
we  will  call  S.  Then  we  want  to  find  the  rank  of  the  matrix  V  =  B  —  S.  Let 
the  eigenvalue  decomposition  of  V  bo  V'  =  QL^Q^ .  You  would  like  to  know 
if  the  last  m  —  t/  -t  1  eigenvalues  of  V  are  small  enough  to  be  considered  zero. 


Define  the  test  statistic  Tiq. 


rio  =  E'?  (2-11) 

i=t^ 

Let  fio{n)  be  the  distribution  of  Tio  when  c(^A^Ci  =  0  is  true.  Then  the  null 
hypothesis  is  /fo  ^  =  0  and  the  alternate  hypothesis  is  Ha  :  Xj  ^  0.  If 

Tio  <  fio(i-n)(n)  then  do  not  reject  Ho,  otherwise  conclude  that  Ho  is  rejected 
and  therefore  chose  Ha-  When  Ho  is  true,  B  is  not  of  rank  i/,  and  there  are  not 
u  significant  sources.  When  Hq  is  false,  we  reject  Ho,  and  by  default  choose 
Ha,  concluding  that  there  are  at  least  u  significant  sources. 

A  sequential  test  for  rank  that  begins  with  the  largest  eigenvalue  may  be 
practical  in  systems  with  a  large  number  of  sensors  and  a  few  expected  sources. 
The  idea  is  to  test  the  ratio  of  the  largest  u  eigenvalues  to  the  sum  of  all  the 
eigenvalues,  as  given  by  statistic  Tn. 


Tn  = 


(2.12) 


The  density  of  Tii  can  be  obtained  by  theorem  8.  A  sphericity  test  could 

similarly  be  constructed  for  the  eigenvalues  yet  to  be  estimated,  such  as  T^. 

r  /  1/ 


Ti2  =  n  In 


det(B) 

n 

V 


(2.13) 


Of  course,  it  is  also  possible  to  consider  the  difference  between  adjacent 
sample  eigenvalues  If  —  or  the  ratio  )  where  /?  is  some  real  constant 
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of  proportionality  of  interest.  If  you  expect  to  be  a  variance  associated 
with  noise  and  to  be  signal  plus  noise  variance,  then  you  may  be  interested 
in  ^  =  10°'^  to  correspond  to  a  1  dB  (5  +  N)/N  ratio,  or  ^  =  10®  ®  for  a  3  dB 
ratio.  Similar  statistics  between  any  two  sample  eigenvalues  of  interest  may 
also  be  appropriate,  such  as  If  —  If  or  lff{0lj).  Values  of  j  that  may  be  of 
special  interest  are  1  and  p.  You  may  want  to  use  the  average  of  the  sample 

P 

eigenvalues  a  =  -  ^f  instead  of  If.  All  test  statistics  (except  of  the  form  of 
Ti)  are  heuristically  based,  drawing  from  background  in  regression  analysis. 

In  this  chapter,  the  problem  has  been  cast  as  a  problem  in  complex  principal 
components  analysis.  A  number  of  testing  situations  with  their  commonly 
known  test  statistics  have  been  presented.  The  goal  of  this  thesis  is  to  develop 
the  density  functions  for  these  statistics  We  sill  see  some  more  easily  derived 
density  functions  for  closely  related  tests  presented  during  our  search  for  the 
densities  of  the  stated  tests.  This  is  a  problem  in  dimensionality  reduction. 


Chapter  3 
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OTHER  APPLICATIONS 

The  purpose  of  this  chapter  is  to  briefly  review  applications  of  system  identi¬ 
fication  in  which  the  eigenvalue  based  hypothesis  testing  approach  might  be 
used.  In  addition,  applications  are  suggested  for  acoustic  emission  analysis 
and  acoustical  oceanography.  Those  are  areas  I  am  interested  in  and  have 
not  seen  evidence  of  order  determination  applied.  Even  though  the  abstract 
problem  has  its  own  beauty,  this  chapter  shows  that  it  also  has  usefulness. 
Application  to  acoustic  emission  analysis  will  be  discussed  in  section  3.1  and 
acoustical  oceanography  will  be  discussed  in  section  3.2.  This  section  briefly 
identifies  a  variety  of  other  applications. 

Goodman  was  an  early  pioneer  in  the  distribution  theory  and  application  of 
complex  random  variables.  He  reported  that  geophysicists  treat  simultaneous 
measurements  at  several  positions  in  the  ocean  of  the  height  of  gravity  waves 
generated  by  the  wind  as  multivariate  complex  normal  records  [92]. 

Krishnaiah  and  Waikar  [147]  reported  that  the  distributions  of  the  inter¬ 
mediate  roots  can  be  used  for  reduction  of  dimensionality  in  pattern  recog¬ 
nition  problems  and  principal  component  analysis.  In  nuclear  physics,  the 
distributions  of  any  few  consecutive  ordered  roots  are  useful  for  finding  the 
distributions  of  the  spacings  between  the  energy  levels  of  certain  complicated 
systems  [53][174].  Krishnaiah  and  Waikar  referenced  Wigner  with  regard  to 


applications  in  physics  [284] [285] [287].  Krishnaiah  and  Shuurmann  [151]  ap¬ 
plied  methods  similar  to  those  developed  in  the  present  research  to  vertical 
and  horizontal  accelerometer  data  to  examine  the  vibration  at  different  lo¬ 
cations  of  the  cargo  deck  on  a  C-5A  aircraft.  They  also  referenced  Cooper 
and  Cooper’s  work  [60]  in  non-supervised  signal  detection  and  pattern  recog¬ 
nition.  Hotel  [111]  wrote  a  very  nice  article  on  the  theory  and  practice  of 
complex  principal  component  analysis.  He  said  it  has  been  shown  to  be  a 
useful  method  for  identifying  traveling  and  standing  waves  in  geophysical  data 
sets.  The  frequency  domain  principal  component  (FDPC)  analysis  is  the  most 
general  of  the  available  methods  of  studying  propagating  phenomena.  Com¬ 
plex  principal  component  (CPC)  analysis  in  the  time  domain  is  considered  an 
attractive  alternative  to  FDPC  analysis.  CPC  analysis  is  essentially  FDPC 
analysis  averaged  over  all  frequency  bands. 

Krishnaiah  [150]  references  Liggett’s  wo”k  [166]  in  passive  sonar  and  Priestly’s 
work  [209]  in  system  identification.  Kelly  ct  al.  [134]  applied  concepts  of  statis¬ 
tics  of  complex  variables  in  an  active  sonar  acoustic  imaging  problem  where 
noise  n{t)  was  distributed  according  to  C/V(0, S)  where  E  =  ^7.  Tague 
[264]  used  concepts  developed  during  this  thesis  research  for  evaluating  the 
signal-to-noise  ratio  of  a  beamformer  output.  The  complex  matrix  normal 
distribution,  whose  form  is  verified  in  this  thesis,  is  the  natural  setting  for  be¬ 
ginning  analysis  of  two-dimensional  spatial  data  such  as  found  in  rectangular 


sonar  arrays. 


The  solution  to  the  problem  of  this  thesis  is  also  the  solution  to  some  ap¬ 
plications  involving  remote  sensing,  stich  as  data  compaction.  It  will  allow 
automation  of  a  wide  range  of  analyses  now  requiring  application  area  spe¬ 
cialists.  The  list  of  areas  to  which  these  methods  apply  is  growing  as  people 
discover  how  to  work  with  complex  random  variables.  For  some  other  appli¬ 
cations,  see  references  [165]  and  [128].  Other  references  to  the  statistics  of 
complex  variables  include  [30][70][132][133][127][62][178][179]. 

3.1  Acoustic  Emission  Analysis 

Acoustic  emission  testing  is  the  detection,  location,  and  analysis  of  acoustic 
emissions  from  materials  under  static  or  dynamic  stress.  The  term  “acoustic 
emission”  (AE)  refers  to  the  class  of  phenomena  whereby  transient  elcistic 
waves  are  generated  by  the  rapid  release  of  energy  from  localized  sources  within 
a  material,  or  the  transient  elastic  waves  so  emitted.  Other  (less  preferred) 
terms  used  for  the  same  phenomena  are  “stress  wave  emission”  (SWE)  and 
“microseismic  activity”.  Standard  definitions  for  terms  relating  to  acoustic 
emission  are  given  in  reference  [32]. 

Short  [243]  noted  that  the  first  major  systematic  approach  to  acoustic 
emission  of  materials  under  stress  was  by  Kaiser.  Kaiser  concluded  that  the 
number  of  emissions  increased  with  the  applied  stress,  and  that  after  unloading 
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there  was  no  acoustic  emission  upon  reloading  until  the  previous  maximum 
load  was  exceeded.  This  is  known  as  the  Kaiser  effect,  and  is  observed  both 
in  metals  and  composites  at  low  loads.  If  a  composite  is  not  held  at  a  load  in 
the  elastic  region  until  all  emissions  have  stopped  and  is  unloaded,  emissions 
then  occur  at  a  load  lower  than  the  previous  maximum  load. 

Acoustic  emissions  are  detected  using  one  or  more  transducers,  usually 
piezoelectric  transducers,  to  obtain  an  electric  signal  proportional  to  the  me¬ 
chanical  vibration  at  the  location  of  the  transducer.  An  array  of  transducers 
is  required  to  locate  the  source  of  an  emission  by  comparing  the  arrival  times 
of  acoustic  transients  at  each  transducer.  A  multichannel  analyzer  is  used  to 
cross-correlate  the  signals  in  the  time  domain. 

An  acoustic  emission  may  be  identified  by  its  signature  in  the  time  and 
frequency  domain.  Within  the  time  domain,  the  important  parameters  are 
the  amplitude  rise  time  and  emission  duration.  Emissions  are  also  classified  by 
their  frequency  spectrum.  Together,  emissions  are  characterized  by  their  time- 
dependent  frequency  distribution.  This  is  a  function  of  the  type  of  material, 
geometry,  structures  coupled  to  it,  the  applied  stress,  and  the  mechanism 
producing  the  emission. 

AE  testing  is  still  in  its  infancy.  Theoretical  work  lags  far  behind  its  use 
in  practical  applications.  A  very  basic  open  question  is  why  growing  cracks  in 
some  materials  emit  many  AEs  while  in  other  materials  growing  cracks  emit 
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hardly  any  AEs  [168].  A  procedure  for  AE  testing  for  fiberglass  reinforced 
plastic  tanks  is  contained  in  [33]. 

Applications  require  recognition  that  sound  propagation  is  dispersive.  When 
the  structtire  is  liquid  loaded,  the  analysis  must  also  recognize  that  there  is 
coupling  between  propagation  modes.  A  very  short  time  after  emission,  most 
(over  93%)  of  the  energy  is  in  bending  waves,  which  means  that  a  surface 
mounted  transducer  will  be  effective  as  a  sensor  [77]. 

The  use  of  triangulation  which  works  nicely  against  a  point  source  is  not 
optimum  against  a  source  that  is  spatially  extended  or  against  multiple  sources 
[195].  Triangulation  search  for  emission  sites  is  time  consuming  and  makes 
poor  use  of  the  data.  Alternatives  include  the  use  of  surface  mounted  arrays. 
This  overcomes  the  problem  of  detecting  and  locating  multiple  sources,  and 
mapping  of  sources  that  are  not  small  enough  to  be  considered  point  sources. 
This  approach  was  examined  by  Simaan  et  al.  [244].  The  authors  assumed  a 
constant  speed  of  sound  and  thus  treated  only  longitudinal  waves.  However, 
by  processing  signals  at  a  selected  frequency,  these  concepts  can  be  applied 
to  bending  (transverse)  waves.  By  doing  this  at  several  frequencies,  an  added 
benefit  is  that  the  time-dependent  frequency  signature  at  the  source  location 
can  be  reconstructed  which  aids  the  classification  of  the  type  of  emission. 

AE  data  is  very  noisy.  Deciding  how  many  sources  exist  is  typically  de¬ 
termined  by  the  judgment  of  the  engineer  in  post-processing  of  the  data.  Re- 
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moval  of  the  analysis  engineer  from  the  immediate  test  environment  restricts 
the  ability  to  take  timely  action  for  follow-up  testing,  or  examination  of  the 
test  environment  for  explanations  of  the  signal  that  may  be  due  to  something 
possibly  obvious  to  an  on-site  observer.  Thus,  in  situ  automated  detection  and 
emission  site  determination  not  only  increases  the  efficiency  of  the  testing,  it 
allows  observation  of  causes  that  otherwise  would  escape  notice  or  explanation. 

Because  AE  data  is  noisy,  the  covariance  matrix  will  contain  all  nonzero 
eigenvalues.  It  is  critical  to  determine  which  of  the  eigenvalues  are  associated 
with  AE  signals  and  which  are  associated  with  noise.  When  the  ratio  between 
adjacent  eigenvalues  is  large,  making  this  judgment  by  merely  examining  the 
eigenvalues  without  other  processing  is  appropriate.  Almost  always,  the  large 
eigenvalues  will  be  associated  with  an  AE  of  interest,  and  the  small  ones  will 
be  associated  with  noise.  When  the  ratio  between  adjacent  eigenvalues  is  not 
large,  then  it  is  more  difficult  to  make  the  judgment  without  a  more  formal 
approach. 

In  traditional  AE  testing,  a  tank  or  vessel  is  subjected  to  artificially  in¬ 
duced  forces  to  place  the  material  under  enough  stress  to  produce  emissions 
at  existing  flaws.  For  example,  a  tank  might  be  pressurized  well  above  its  nor¬ 
mal  operating  pressure.  Enough  emissions  are  produced,  and  the  monitoring 
period  is  long  enough,  that  the  signal-to-noise  ratio  is  large  enough  to  produce 
a  detectable  and  usable  signal. 
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There  are  circumstances  where  this  might  be  undesirable.  For  example, 
traditional  methods  of  applying  the  stress  might  be  very  expensive,  time  con¬ 
suming,  or  hazardous.  This  might  be  the  case  for  testing  the  hull  of  a  ship. 
The  cost  of  testing  might  be  significantly  less  by  not  requiring  the  ship  to 
enter  dry  dock.  By  decreasing  the  required  signal-to-noise  ratio,  it  might  be 
possible  to  use  the  normal  operating  forces  of  the  industrial  process,  or  the 
forces  of  nature,  to  provide  the  stress-inducing  force  needed  to  produce  AE 
events  under  usually  safe  conditions. 

If  the  industrial  process  is  critical  and  possibly  hazardous  if  corrective 
action  is  not  taken  shortly  after  the  onset  of  a  failure,  continuous  monitoring 
might  be  desirable.  This  means  that  monitoring  must  be  done  under  normal 
operating  conditions,  which  might  not  usually  induce  stresses  large  enough  to 
generate  enough  high  level  emissions  to  be  detectable  by  present  means.  An 
alternative  monitoring  technology  is  to  embed  or  coat  the  object  with  optical 
fibers  or  very  thin  wires.  When  a  crack  occurs  in  the  material,  the  fiber  or 
wire  breaks,  detecting  the  existence  of  the  first  crack.  The  problem  with  this 
method  is  that  only  the  first  crack  along  the  filament  is  detectable.  Subsequent 
cracks  along  that  filament  are  not  detectable.  A  field  monitoring  technique, 
such  as  EM  detection,  is  required.  Monitoring  a  nuclear  reactor  vessel  might 
be  such  an  application. 

Another  motivation  for  wanting  to  make  detection  of  AE  events  at  a  lower 
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signal-to-noise  ratio  is  to  increase  the  area  under  effective  testing.  This  could 
speed  testing  of  very  large  structures  and  thus  decrease  costs.  This  might  be 
the  case  for  large  natural  gas  tanks  or  pipelines. 


3.2  Acoustical  Oceanography 

The  problem  addressed  in  this  thesis  is  the  same  as  the  problem  of  determining 
the  number  of  different  arrival  angles  at  a  vertical  line  array.  In  a  propagation 
loss  experiment,  for  a  fixed  frequency,  each  arrival  angle  can  be  associated  with 
a  different  propagation  mode.  By  determining  the  vertical  directions  of  arrival 
of  a  test  signal  during  propagation  loss  experiments,  it  is  possible  to  determine 
more  precisely  the  energy  distribution  of  sound  among  the  propagating  modes. 
Such  examination  is  useful  in  situ  to  determine  the  adequacy  of  the  hypothe¬ 
sized  propagation  loss  model  used  in  planning  the  experiment,  judging  if  the 
propagation  conditions  are  acceptable  for  continuation  of  the  experiment  as 
planned,  and  planning  the  source  placement  for  additional  samples  if  any  are 
needed  to  meet  the  experimental  goals.  When  transients  are  used  as  sources, 
it  is  necessary  to  determine  the  number  of  received  modes  and  directions  of 
arrival  at  a  given  frequency  based  on  only  a  few  samples.  Verification  of  mode 
presence  early  in  an  experiment  and  accounting  for  actual  environmental  con¬ 
ditions  allows  for  adjusting  sensor  depths  to  construct  a  mode  filter  for  use  for 
the  remainder  of  the  experiment.  This  will  increase  the  signal-to-noise  ratio. 
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allowing  better  data  capture  and  analysis. 

The  general  statistical  techniques  developed  in  the  process  of  research  for 
this  thesis  can  be  applied  to  the  multivariate  analysis  of  ambient  noise  when 
environmental  parameters  are  also  recorded.  Krishnaiah  [150]  notes  that  the 
problems  of  testing  the  hypotheses  on  complex  multivariate  populations  play 
an  important  role  in  drawing  inference  on  the  multiple  stationary  Gaussian 
time  series  since  certain  suitably  defined  sample  spectral  density  matrices  of 
these  time  series  are  approximately  distributed  as  complex  Wishart  matri¬ 
ces.  Jobst  and  Adams  [122]  studied  the  statistics  of  ambient  sea  noise  us¬ 
ing  two  deep  arrays  in  the  North  Atlantic  separated  in  depth  and  by  several 
miles.  They  reported  that  the  statistical  tests  showed  that  most  observations 
of  narrow-band  noise  were  consistent  with  the  hypothesis  that  the  in-phase 
and  quadrature  components  of  ambient  noise  are  zero-mean  Gaussian  pro¬ 
cesses  with  equal  power.  Noise  power  is  locally  homogeneous  over  the  array 
aperture,  and  stationary  for  periods  up  to  22  minutes  at  75  Hz.  As  a  func¬ 
tion  of  frequency,  narrow-band  ambient  noise  measurements  are  consistent 
with  the  hypothesis  of  constant  power  in  adjacent  bands  up  to  0.22  Hz  wide. 
When  analyses  were  extended  to  0.8  Hz  bands  the  noise  power  was  no  longer 
constant. 

Matsumoto  [172]  (p.  358)  cissumed  isotropic  Gaussian  noise  in  reporting 
on  characteristics  of  Sea  MARC  II  phase  data.  McDaniel  [173]  considered  the 
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underneath  surface  of  the  Arctic  ice  canopy  to  have  a  zero  mean  Gaussian 
height  distribution  with  an  rms  roughness  of  1-2  meters  for  the  purpose  of 
modeling  high  frequency  forward  scattering. 

It  is  cautioned  that  the  distributions  that  noise  sources  obey  do  vary  ac¬ 
cording  to  their  cause.  For  example,  wind-driven  sea  surface  noise  has  a  dif¬ 
ferent  distribution  that  noise  due  to  long  range  shipping.  Further,  these  will 
be  diflFerently  distributed  than  noise  from  snapping  shrimp  on  a  shallow  ocean 
floor,  porpoise  and  whale  whistles  and  clicking  in  the  ocean  volume,  or  oil 
industry  generated  noise  on  the  ocean  floor. 


Chapter  4 
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OTHER  APPROACHES 
4.1  General  Discussion 

The  purpose  of  this  chapter  is  two-fold.  First,  it  presents  a  setting  in  which 
order  estimation  and  parameter  estimation  are  subsumed  into  one  approach. 
In  the  abstract  setting,  the  problem  reduces  to  finding  that  probability  mea¬ 
sure,  from  all  the  candidate  probability  measures,  which  “best”  explains  the 
data.  The  second  purpose  is  to  present  a  very  brief  catalog  of  methods  for 
order  determination  other  than  that  being  examined  in  this  thesis. 

There  are  other  approaches  to  model  order  identification.  Methods  tradi¬ 
tional  to  statisticians  can  be  found  in  texts  for  statisticians  on  linear  models. 
This  is  a  question  often  asked  when  building  regression  models.  Some  tech¬ 
niques  used  for  order  determination  for  regression  models  include  the  maxi¬ 
mum  correlation  squared,  the  Cp,  forward  step  wise  variable  inclusion,  back¬ 
ward  step  wise  variable  exclusion,  and  other  criteria.  Soderstrom  [250]  consid¬ 
ered  the  use  of  Wilks’  likelihood  ratio  statistic  and  the  F-test  for  comparing 
two  competing  models.  Prasad  and  Chandna  [208]  hint  at  use  of  canonical 
correlation  between  array  subsets,  where  their  application  is  bearing  measure¬ 
ment.  Methods  traditionally  used  for  model  order  determination  in  the  context 
of  linear  regression  analysis  can  be  found  in  Neter  and  Wasserman  [190].  Most 
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of  the  comments  made  in  this  section  are  taken  from  one  or  more  references. 

Kundu  [158]  discusses  application  of  Cross  Validation.  The  Cross  Vali¬ 
dation  approach  was  studied  by  Lachenbruch  (1975),  Stone  (1974  and  1977), 
Dawid  (1974),  and  C.  R.  Rao  (1988).  Cremona  and  Brandon  [63]  refer  to  a 
Singular  Value  Plot  criterion  Rt.  Others  have  looked  at  Jackknife  procedures 
and  Bootstrap  procedures.  Bouvet  [41]  considered  a  Bayesian  approach. 

Recent  developments  popular  in  the  electrical  engineering  model  order  de¬ 
termination  concentrate  on  techniques  based  on  information-theoretic  criteria. 
These  techniques  are  usually  referenced  in  the  literature  by  their  initials  rather 
than  their  long  title.  An  ancestor  of  these  methods  can  be  seen  in  the  1954 
book  by  Savage  (p.235  ff)[232].  He  considers  the  evaluation  of  information 
given  two  neighboring  values  of  the  parameter  of  an  estimation  problem.  He 
uses  the  concept  of  differential  information  which  he  says  is  even  older  than 
Fisher’s  information. 

The  recent  motivation  for  the  information-theoretic  approach  is  based  on 
the  work  by  Akaike.  His  work  is  traceable  to  1968,  and  he  continued  publish¬ 
ing  at  least  as  late  as  1979.  A  listing  of  22  of  his  publications  ([2]  through  [23]) 
gleaned  from  other  papers  referencing  his  work  appears  in  the  bibliography. 
It  was  his  innovative  1974  paper  [16]  discussing  his  method  known  ais  AIC 
(Akaike  Information  Criteria)  that  is  primarily  responsible  for  the  tremen¬ 
dous  subsequent  world-wide  activity  in  the  information-theoretic  approach. 
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Akaike  explains  (Section  V,  p.719  of  [16])  that  IC  stands  for  information  cri¬ 
terion  and  A  is  added  so  that  similar  statistics,  BIC,  DIG  etc.,  may  follow. 
Rissanen  [224][225]  and  Schwartz  [239]  developed  the  MDL  (Minimum  De¬ 
scription  Length)  method  in  1978  and  1982.  In  1986,  Zhao,  Krishnaiah,  and 
Bai  [297] [298]  derived  a  statistically  consistent  estimator  generalization  of  AIC 
which  is  called  EDC  (Efficient  Detection  Criterion)  or  GIG  (General  Informa¬ 
tion  Criterion).  In  1989,  C.  R.  Rao  and  Y.  Wu  [219]  proposed  two  discriminant 
criteria  that  are  strongly  consistent.  Other  methods  are  CAT  (Criterion  Au¬ 
toregressive  Transfer),  by  Parzen  in  1974  [203],  and  FPE  (Final  Prediction 
Error).  An  ad  hoc  method  is  NEE  (Noise  Error  Estimation). 

4.2  Generalized  Maximum  Likelihood  Esti¬ 
mators 

4.2.1  Introduction 

If  you  choose  the  best  probability  measure  to  fit  your  random  sample,  then  you 
have  determined  the  order  oi  your  system.  Thus,  we  seek  the  measure  that  has 
a  covariance  matrix  of  the  right  rank  and  also  the  proper  parameter  values  if 
the  distribution  family  considered  is  parameterized.  Note  that  this  is  stronger 
than  just  determining  the  order  of  a  system. 

More  abstractly,  families  of  distributions  with  covariance  matrices  of  differ- 
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ent  ranks,  taken  together,  merely  form  a  larger  family  of  measures  from  which 
to  choose.  This  can  also  be  extended  to  sets  of  different  kinds  of  distributions, 
such  as  considering  simultaneously  the  normal  and  Poisson  distributions.  In 
fact,  in  a  parameterized  family  of  distributions,  for  each  fixed  parameter,  you 
have  an  entirely  different  distribution.  Except  for  computational  convenience, 
there  is  no  reason  to  explicitly  consider  parameters  when  finding  a  maximum 
likelihood  estimator.  A  maximum  likelihood  estimator  is  merely  the  selection 
of  that  measure,  from  among  all  measures  you  are  allowed  to  look  at,  that 
best  fits  the  data  from  your  random  sample.  Thus,  you  can  even  consider  an 
unparameterized  class  of  measures.  You  might  properly  argue  that  in  estab¬ 
lishing  sequences,  the  imposed  indexing  becomes  a  parameter  even  though  the 
index  does  not  appear  as  part  of  a  functional  expression  of  the  distribution. 

The  following  discussion  decodes  remarks  by  Kiefer  and  Wolfowitz  (p.  892- 
893)  [140]  on  several  ways  of  generalizing  maximum  likelihood  estimators. 
The  first  set  of  generalizations  treat  the  issue  when  the  supremum  of  the 
likelihood  estimators  is  not  contained  in  the  allowable  set.  The  second  set 
of  generalizations  repeat  the  first,  but  with  the  additional  quality  of  using 
the  Radon-Nikodym  derivative  as  a  generalized  probability  density  function. 
Taken  together,  these  approaches  extend  the  classes  of  functions  for  which  a 
maximum  likelihood  estimator  can  be  obtained.  Application  of  these  concepts 
to  the  order  determination  problem  was  suggested  by  C.  R.  Rao  [215]. 
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The  close  reader  will  observe  that  the  application  of  these  ideas,  where  the 
allowable  set  of  underlying  covariance  matrices  are  of  different  ranks  and  from 
different  complex  Wishart  distributions,  may  be  problematical.  At  issue  is  that 
all  the  measures  under  consideration  must  be  defined  on  the  same  cr-algebra. 
In  the  idealized  case,  you  end  up  with  problems  wanting  to  consider  measures 
with  different  cr-algebras.  For  example,  if  you  consider  a  singular  bivariate 
distribution  in  R^,  the  Lebesgue  measure  A(R^)  of  a  line  is  zero.  Either  you 
decide  that  the  offending  set  is  allowable,  albeit  of  measure  zero,  or  you  decide 
that  such  a  set  is  not  in  the  cr-algebra.  Under  the  first  interpretation,  the 
following  theory  applies.  Under  the  second  interpretation,  the  following  theory 
does  not  apply.  The  physical  world  is  much  nicer  because  we  never  have  the 
case  of  a  truly  deficient  covariance  matrix.  The  problem  becomes  one  of  testing 
for  significant  differences.  This  is,  therefore,  one  case  where  the  abstraction  of 
an  idea  actually  produces  an  approach  that  is  very  practical. 


4.2.2  Lebesgue-Radon-Nikodym  Theorem 

In  this  section  we  present  a  statement  of  the  subject  theorem  and  define  the 
terms  which  will  be  used  in  the  study  of  the  likelihood  estimators  of  Kiefer  and 
Wolfowitz.  This  material  is  from  Rudin  [230].  We  begin  with  a  few  definitions. 
Let  /X  be  a  positive  (7-finite  measure  on  a  <7-algebra  Af  in  a  set  X,  and  let  A 
be  a  complex  measure  on  A4.  Then 


Definition  1  A  <C  ^  means  \{E)  =  0  for  all  sets  E  E  Ai  for  which  n{E)  -  0. 

Definition  2  If  there  is  an  A  £  M.  such  that  \{E)  =  X{A  fl  E)  for  every 
E  £  M.,  then  we  say  X  is  concentrated  on  A. 

Definition  3  Let  Ai,  A2  be  measures  on  M..  Let  A^B  £  M.  such  that  Af\B  = 
$  (the  empty  set),  where  Ai  is  concentrated  on  A,  and  A2  is  concentrated  on  B. 
Then  Ai  and  X2  are  mutually  singular,  and  we  write  this  condition  as  Ai  ±  A2. 

Theorem  1  The  theorem  of  Lebesgue- Radon- Nikodym.  Let  p  he  a  positive 
a-finite  measure  on  a  a-algebra  M.  in  a  set  X,  and  let  X  be  a  complex  measure 
on  At.  Then 

(a)  There  is  then  a  unique  pair  of  complex  measures  Aa  and  A,  on  Ai  such 
that 

A  =  Ao  -|-  Aj  (A  is  partitioned) 

Aa  *C  (Aa  is  absolutely  continuous  with  respect  to  p) 

X3  A.  p  {Xa,p  o.re  mutually  singular) 

(b)  There  is  a  unique  h  £  L^{p)  such  that 


for  every  set  E  £  Ai. 

Some  remarks  are  in  order  regarding  what  is  important  about  the  above 


theorem. 


!•  i^a^^s)n  is  called  the  Lebesgue  decomposition  of  X  relative  to  p. 
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2.  Existence  of  the  decomposition  is  the  significant  part  of  (a). 

3.  Part  (b)  is  known  as  the  Radon-Nikodym  theorem. 

4.  The  function  h  is  called  the  Radon-Nikodym  derivative  of  Ao  with  respect 
to  p. 

The  theorem,  remarks,  and  definitions  make  much  more  sense  after  looking 
at  figure  4.1. 


Figure  4.1.  Graphic  Representation  of  the  Lebesgue- Radon-Nikodym  Theorem 

In  this  figure,  the  complete  region  inside  the  frame  represents  the  set  X.  We 
have  defined  two  measurable  sets  in  the  same  cr-algebra  Ai.  We  will  refer  to 
measures  which  are  defined  on  this  common  a-algebra.  Set  A,  in  the  left  half 
of  the  figure,  is  the  set  of  elements  of  .Y  on  which  the  measure  A,  ^  0.  We  can 
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say  that  set  A  is  the  support  of  measure  A, ,  or  we  say  that  A^  is  concentrated 
on  A.  Everywhere  outside  of  A  we  know  that  A*  =  0.  At  those  points  in  X 
where  A,  =  0,  we  say  that  A^  is  singular.  Similarly,  set  B  is  the  set  of  elements 
of  X  on  which  measure  p  ^  0.  Thus,  B  is  the  set  on  which  p  is  concentrated. 
In  this  particular  example,  the  sets  A  and  B  are  disjoint.  In  those  regions 
where  both  A,  =  0  and  /i  =  0,  we  say  that  A,  and  p  are  mutually  singular.  We 
denote  this  by  A,  ±  p. 

The  notation  for  mutual  singularity.  A,  X  /x,  is  suggestive  of  orthogonality. 
Mutual  singularity  is  a  mathematically  stronger  concept  than  orthogonality. 
All  mutually  singular  functions  are  mutually  orthogonal,  but  mutually  or¬ 
thogonal  functions  are  not  necessarily  mutually  singular.  Functions  that  are 
mutually  orthogonal  may  individually  attain  non-zero  values  on  the  common 
set  over  which  the  pair  of  functions  are  orthogonal. 

Mutual  singularity  is  a  property  of  functions  that  are  measures  defined  on 
a  common  sigma-algebra.  It  is  useful  to  think  in  terms  of  these  functions  as 
having  mutually  exclusive  support. 

Orthogonality  is  a  property  of  a  pair  of  functions,  a  common  domain,  and  a 
relation  defined  on  those  functions  over  the  entire  domain.  Orthogonality  does 
not  require  the  pair  of  functions  to  be  measures.  Orthogonality  is  a  concept 
usually  dealt  with  when  discussing  inner  product  spaces.  However,  the  inner 
product  is  a  stronger  concept  than  what  orthogonality  requires  of  its  relational 


operator. 


Suppose  we  have  another  measure  Aq  that  is  concentrated  on  some  subset 
(possibly  all)  of  B.  Then  everywhere  Aa  is  nonzero  we  know  that  fi  is  also 
nonzero.  An  alternate  way  of  saying  the  same  thing  is  that  everywhere  is 
zero,  we  require  Aa  to  also  be  zero.  When  this  is  true  for  every  measurable  set 
E  belonging  to  Ad,  we  say  that  //  dominates  Ao.  We  denote  this  by  Aa  -C  //. 

The  Lebesgue- Radon- Nikodym  theorem  says  that  when  you  are  given  a 
positive  (T-finite  measure  /x  on  a-algebra  Ad  in  a  set  Af,  and  also  given  any 
complex  measure  A  also  defined  on  Ad,  then  this  measure  A  has  a  unique 
decomposition  A  =  Aa  +  A,  satisfying  the  conditions  that  Ao  <  /x  and  A*  J.  /x. 
Another  way  of  saying  this  is  that  for  any  given  pair  of  measures  (A,/x)  that 
are  defined  on  the  same  <r-algebra  Ad,  then  there  exists  some  subset  A  of  X 
on  which  A  ^  0  when  /x  =  0,  and  some  subset  .6  of  X  on  which  A  0  when 
/X  ^  0.  This  is  a  partitioning  of  the  regions  of  X  on  which  A  0  where  the 
partition  is  determined  by  the  region  of  X  where  /x  0.  In  fact,  the  set  A  can 
be  the  5-complement  of  X,  A  =  X\B  =  X%.  When  viewed  in  this  way,  it  is 
obvious  that  this  decomposition  of  A  is  unique  for  a  specified  /x.  Note  also  that 
A,  T  Aa. 

The  part  of  the  theorem  that  deals  with  the  Radon-Nikodym  derivative 
is  a  bit  more  subtle.  If  you  look  at  every  measurable  set  E  in  tr-algebra  Ad, 
then  there  exists  only  one  function  h  that  accurately  describes  the  relation- 
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ship  between  Aa  and  fj.  over  the  whole  <T-algebra.  One  of  the  points  that  must 
be  satisfied  is  that  Aa  and  /x  must  be  defined  over  the  same  cr-algebra.  Kol¬ 
mogorov  and  Fomin  [141]  point  out  that  the  Radon-Nikodym  theorem  only 
establishes  the  existence  of  the  derivative  h  =  ^,  but  does  not  tell  how  to 
compute  it.  They  refer  to  Shilov  and  Gurevich  (chapter  10)  [242]  for  an  ex¬ 
plicit  procedure  for  evaluating  this  derivative  at  a  point  xo  €  Af  by  calculating 
the  limit  lirn^^^^  where  {Ec}  is  a  system  of  sets  converging  to  the  point  xo 
as  e  — ^  0  in  a  suitably  defined  sense.  In  a  very  generalized  way,  this  might 
define  a  sequence  of  sets  such  that  for  tk  <  Ck-i  we  have  Ek  C  Ek-i  subject 
to  the  condition  that  xq  E  Ek.  In  the  case  of  a  function  /  defined  on  R,  there 
is  an  explicit  procedure  for  finding  the  derivative  of  /  at  a  point  Xo  given  by 

lim  ^  =  lim  + 

Aa:-+0  Ax— ►O  /\t 

There  are  some  handy  rules  for  working  with  the  Radon-Nikodym  deriva¬ 
tive.  They  are  very  similar  to  the  rules  for  working  with  common  derivatives. 
The  primary  difference  is  the  explicit  statement  of  conditions  under  which  the 
rules  work.  The  following  are  given  by  Phillips  (p.  429)  [207]. 

Theorem  2  Manipulation  Rules  for  the  Radon-Nikodym  Derivative. 

1.  If  a,6€  R+,  1/  fi,  and  A  <  then 

d{ai'  -f  bi/}  dv  dX 

dp  dp  dp 
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Note  that  both  measures  v  and  A  are  dominated  by  the  same  measure 
fi.  It  is  not  strictly  correct  to  call  ^  an  operator.  The  technical  point 
here  is  that  ^  is  meaningless.  However,  if  you  did  consider  it  to  be  an 
operator,  this  shows  that  the  operator  is  linear. 

2.  If  1/  /i  and  /x  -C  A,  then  i/  <C  A.  The  relation  <C  is  transitive. 

3.  Given  measures  t/,  fi,  and  A  such  that  v  ii  and  ^  <C  A,  then  there  is  a 
chain  rule 

du  ( dv\  f dfi\ 

d\^  \dii)  [dxj 

Note  that  the  measure  in  the  denominator  of  each  term  dominates  the 
corresponding  term  in  the  numerator. 

4.  U  1/  and  /x  •<  t',  then 

/di/A  ^  / dfi\ 

\d^)  \dv ) 

What  does  it  mean  for  i/  <C  /x  and  /i  •C  It  means  that  measures  v 
and  have  the  same  region  of  support  in  X,  or  equivalently  u  and  //  are 
concentrated  on  the  same  set. 

A  very  interesting  note  is  that  if  i/  and  are  both  measures  defined  on  the 
same  o-algebra  M,  then  i/  +  /x  is  also  a  measure  defined  on  cr-algebra  A4.  Fur¬ 
ther,  this  new  measure  i/-f  //  dominates  both  i/  and  fi.  When  we  specialize  our 
discussion  to  probability  measures,  then  if  u  and  /x  are  probability  measures. 
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then  ai/  -f  is  also  a  probability  measure  when  a  +  =  1.  In  general,  any 

n  n 

convex  sum  5  =  13  Q'tA'fcj  53  =  1,  of  probability  measures  {/ifc}]J_j  defined 

*=1  *=i 

on  the  same  <T-algebra  M  is  also  a  probability  measure,  and  that  convex  sum 
dominates  each  individual  probability  measure.  It  also  dominates  any  convex 
sum  formed  from  a  subset  of  those  probability  measures. 

4.2.3  Kiefer  and  Wolfowitz  Development  of  Maximum 
Likelihood  Estimators 

We  are  now  prepared  to  consider  the  work  of  Kiefer  and  Wolfowitz  [140].  In  ex¬ 
amining  the  source  literature,  the  reader  will  notice  that  Kiefer  and  Wolfowitz 
denote  the  parameter  space  by  fl  x  F  where  I  have  only  used  F.  They  used  the 
more  structured  space  definition  to  facilitate  their  proof  of  consistency.  Their 
level  of  detail  is  not  required  for  the  development  of  the  following  ideas. 

Recall  that  a  likelihood  function  is  the  conditional  joint  distribution  of  a 
collection  of  random  samples  for  a  given  underlying  distribution.  The  usual 
case  of  interest  is  where  the  underlying  distribution  is  the  unknown  being 
sought.  When  the  random  samples  are  assumed  independent  and  identically 
distributed,  the  likelihood  function  is  the  respective  product  of  the  marginal 
conditional  distributions.  We  consider  two  classes  of  maximum  likelihood  es¬ 
timators  which  are  distinguished  by  the  existence  or  non-existence  of  some 
dominating  measure  fi. 
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Maximum  Likelihood  Estimators  when  a  Dominating  Measure  Ex¬ 
ists 

Let  a  dominating  measure  /x  exist.  This  assumption  distinguishes  the  following 
generalizations  from  ones  that  require  use  of  the  Lebesgue- Radon- Nikodym 
derivative. 

Maximum  Likelihood  Estimator  (MLE) 

For  a  given  random  sample  of  size  n,  such  a  likelihood  function  can  be 
expressed  by 

L{Zir--,Zn  I  7)  =  n/(^n  I  7) 

i=l 

where  7  is  the  underlying  distribution,  which  we  recall  is  a  measurable  func¬ 
tion.  We  are  interested  in  a  sequence  of  //-measurable  functions  {7}  such 
that 

L{Zi,---,Zn  I  7(zi,---,2n))  >  SUp{ /.(^i ,  •  •  •  ,  |  7),  7  ^  T} 

for  almost  all  (si, •  •  • , 2„)  with  respect  to  meaisure  //,  and  for  all  nonnegative 
integers  n  E  N. 

Let  Zn  =  (^1,  •  •  • ,  2n)  and  consider 

sup{L(2i,---,2„  I  7),  7  e  r}  def  sup{L(zn  1  7),  7  €  F) 

where  L  is  a  mapping  from  the  product  space  Z  x  T  into  some  space  Y.  The 
supremum  is  taken  in  Y.  The  finiteness  of  L  for  all  (7,n)  imph'''-  ♦^’at  the 


supremum  of  L  is  also  finite.  Therefore,  I  can  have  different  sequences  of 
7  G  r  that  produce  convergent  sequences  of  L  to  its  supremum,  as  shown  in 
figure  4.2. 


Figure  4.2.  Maximum  Likelihood  Estimate  (MLE)  Convergent  Sequences  of  L 

There  is  uo  guarantee  that  the  sequences  which  have  a  common 

supremum  are  produced  by  a  unique  sequence  {7n,fc}.  For  some  fixed  value  of 
n,  we  can  observe  the  following. 

Supremum,  L  Sequence  Parameter  Sequence 

sup{I(z„  I  7n),---,L(2„  I  7ifc),---  {7iit} 

=  SUp{T(z„  I  72l),-",^(2n  I  72fc).-”  {72*}  I2 


=  SUp{I(2„  1  7m2l),  •  •  •  1  L{Z„  1  7m*),  •  •  •  {ymk}  7m 


So,  we  get  a  maximum  likelihood  estimator,  not  necessarily  a  unique  max¬ 
imum  likelihood  estimator.  It  is  possible  that  7^  is  not  contained  within  the 
set  r  of  allowable  distributions  or  measurable  functions.  If  there  is  no 
contained  in  F,  then  we  say  that  the  maximum  likelihood  estimator  does  not 
exist. 

Modified  Maximum  Likelihood  Estimator  (MMLE) 

This  is  an  approach  to  extend  the  concept  of  a  maximum  likelihood  es¬ 
timator  to  increase  the  number  of  cases  for  which  a  maximum  likelihood  es¬ 
timator  exists.  As  with  the  maximum  likelihood  estimator,  we  seek  to  find 
sup{Z/(2„  1  7),  7  €  F}.  The  supremum  is  taken  of  L  in  the  set  Y. 

We  are  interested  in  a  sequence  of  //-measurable  functions  {7}  such  that 
for  some  0  <c  <  1,  we  have 

L{zi,---,Zr,  I  7(21,  >C-SUp{F(2i,---,2„  |  7),  7  ^  T} 

for  almost  all  (zt,  •  •  •  ,2„)  with  respect  to  measure  //,  and  for  all  nonnegative 
integers  n  €  N.  When  c  =  1,  this  is  the  usual  maximum  likelihood  estimator. 

Consider  looking  at  a  number  a  little  less  than  W  —  sup{Z,(2„  |  7),  7  €  F}, 
such  as  cW  where  c  €  (0,1).  Then  for  c  sufficiently  small,  we  hope  to  find 
7n  €  F  such  that  L{zn  1  7n)  >  cW.  In  essence,  we  are  defining  a  distance 
between  L{z  ]  71)  and  L{z  |  72).  Call  it  p{L\^L2).  Conceptually,  we  want  to 
find  those  L2  having  7  €  F  such  that  p(L\^  £,2)  <  t  where  L\  is  the  supremum 
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of  {L{zn  I  7)5  7  €  r}  for  some  c  >  0.  Figure  4.3  illustrates  the  concept.  It 
shows  a  region  of  F  such  that  the  maximum  likelihood  estimator  is  not  in 
Act,  but  p{L{zn  1  7*),I(2„  I  7/4))  <  t- 


Figure  4.3.  Modified  Maximum  Likelihood  Estimator  (MMLE)  Convergent 
Sequences  of  L 

The  modified  maximum  likelihood  estimators  found  in  this  way  are  not 
necessarily  in  the  neighborhood  of  a  maximum  likelihood  estimator  when  a 
maximum  likelihood  estimator  exists,  but  a  maximum  likelihood  estimator 
will  always  have  a  modified  maximum  likelihood  estimate.  For  parameterized 
distributions,  it  is  possible  that  a  modified  maximum  likelihood  estimator  7* 
could  be  at  a  considerable  distance  (by  some  suitably  chosen  distance  function) 
from  any  maximum  likelihood  estimate  7. 
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Neighborhood  Maximum  Likelihood  Estimator  (NMLE) 

A  neighborhood  maximum  likelihood  estimator  is  a  sequence  of  //-measurable 
functions  {7*}  satisfying 

sup{L(2:i,---,2„  I  7),  7  €  r,  %,7;^(2i,---,2n))  <  e„} 

for  almost  all  (2i,---,2„)  with  respect  to  measure  //  for  a  sequence  of  {e„} 
where  €„  >  0  and  c„  — »  0. 

Again,  let  L  be  a  mapping  from  the  product  space  Z  xT  into  Y.  As  before, 
a  whole  set  of  parameter  values  can  be  obtained  that  produce  the  same  sup  L. 
Call  these  {71}^^!  for  a  fixed  n.  Then  we  get  the  following. 

Supremum,  L  Sequence  Parameter  Sequence 

sup{Zr(2n  I  ,L{zn  \  7u)r ' *  limfc-.oo{7u}  7i 

SUp{Z((2n  (  7ml)i  ’  *  ’  »  |  7»nfc)»  '  '  '  linifc_oo{7m/c}  ^  7m 

The  concept  is  illustrated  in  figure  4.4. 

Pick  some  >  0,  and  define  a  distance  function  6(71,72).  Then  7^  is  any 
7  within  distance  e„  of  7m.  Then  there  is  a  family  {7*  ^}  of  neighborhood 
maximum  likelihood  estimators,  just  as  there  was  a  family  of  modified  or 
traditional  maximum  likelihood  estimators  in  the  previous  examples.  What 
has  been  gained  is  that  the  neighborhood  maximum  likelihood  estimator  exists. 

Even  though  7  might  be  outside  the  space  P  of  allowed  parameters,  7* 
can  be  chosen  in  F.  When  you  find  sup{A(2„  |  7),  7  €  F,  6(7,7*)  <  fn}i  you 


Figure  4.4.  Neighborhood  Maximum  Likelihood  Estimator  (NMLE)  Conver¬ 
gent  Sequences  of  L 

ensure  you  have  located  a  maximum  likelihood  estimator  by  constraining  this 
to  equal  sup{Z-(z„  |  7),  7  €  F}. 

Maximum  Likelihood  Estimators  Without  Requiring  Existence  of  a 
Dominating  Measure 

I  seek  to  generalize  the  concepts  above  to  ensure  the  existence  of  an  estimator 
in  its  generalized  form.  For  this,  we  turn  to  the  Radon-Nikodym  derivative  as 
a  generalization  of  a  density  function. 


Generalized  Maximum  Likelihood  Estimator  (GMLE) 


Following  Johansen  [123],  let  P\  and  P2  be  members  of  a  non-dominated 
family  of  probability  measures  V.  Thus,  Pi  and  P2  are  measures.  Further, 
there  is  no  measure  X  E  "P  that  dominates  all  the  other  measures  Pk  €  V. 
Recall  that  if  Pk  <C  A,  then  everywhere  A  =  0,  we  also  require  P*  =  0  for  every 
measurable  set  E  belonging  to  sigma- algebra  M.  We  know,  however,  that  if 
we  sum  any  two  measures,  the  sum  dominates  the  individual  measures.  Thus 
Pi  ^  Pi  +  P2  and  also  P2  -<  Pi  +  P2.  Define  the  Radon-Nikodym  derivative 


r(2„,Pi,P2l 


diPi+P2r^ 


The  term  Radon-Nikodym  derivative  of  the  measure  Pj  with 

respect  to  the  dominating  measure  (Pi  -|-  P2)  evaluated  at  the  point  Then 
define  P  as  the  generalized  maximum  likelihood  estimator  if,  for  arbitrary  fixed 

A  A 

Zni  the  condition  r{zn,  P,  P)  >  r{z„,  P,  P)  is  satisfied  for  all  P  6  P.  This  says 
that  P  is  the  generalized  maximum  likelihood  estimator  if 


dP 


d{P  +  P) 


> 


dP 


d{P  +  P) 


(4.1) 


for  all  P  G  P. 

So,  we  are  searching  over  the  space  of  all  allowable  probability  measures 
for  the  one  that  maximizes  the  Radon-Nikodym  derivative,  when  taken  with 
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respect  to  the  pair-wise  sum  of  the  maximizing  measure  and  each  other  al¬ 
lowable  measure.  Johansen  [123]  notes  that  when  V  is  dominated  by  cr-finite 
measure  /i,  then  equation  4.1  is  equivalent  to  the  usual  definition  of  a  maximum 
likelihood  estimator. 

The  following  are  some  useful  relevant  observations  made  by  Kundu  [157]. 
Suppose  that  r(2,  P,  P)  >  r{z,  P,  P)  and  P  P  p,.  Now  perform  a  change 
of  variables.  Let 

P{E)=  I  r{z,P,P)diP  +  P) 

J  E 

and  let 

P{E)=  f  r{z,P,P)d{P  +  P) 

J  E 


P{E)=  I  g{z,P,P)dfi 

JE 


where 


and 


Therefore, 


9{z,P,P)  =  r{z,P,P) 


d(P  +  P) 

dfi 


P{E)=  I  g{z,P,P)dg 

Je 


r{z,P,P)>r{z,P,P) 
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implies 

g{z,P,P)  >  g[z,P,P) 
and  both  imply  P{E)  >  P{E). 

C.  R.  Rao  [216]  provided  the  following  useful  observation.  To  see  the 
relationship  between  Johansen’s  expression  for  generalized  maximum  likeli¬ 
hood  estimator  and  that  defined  by  Kiefer  and  Wolfowitz,  first  observe  that  if 

dPi  ^  d(Pi  +  P2)  -  dPi  ^  dP2 
''  d(P,  -hP2)  d(P,  +P2)  d(Pi-hP2) 


Radon-Nikodym  derivative  does  not  avoid  the  issues  of  existence  of  a  maximum 
likelihood  estimator,  or  convergence,  or  uniqueness.  The  same  geometry  as 
shown  in  figure  4.2  applies. 


Generalized  Modified  Maximum  Likelihood  Estimator 


As  with  the  generalized  maximum  likelihood  estimator  (GMLE),  we  extend 
the  definition  of  the  modified  maximum  likelihood  estimator  by  using  the 
Radon-Nikodym  derivative  as  a  generalized  density.  The  discussion  about  the 
existence  and  uniqueness  of  a  modified  maximum  likelihood  estimator  also 
applies  to  the  generalized  modified  maximum  likelihood  estimator  (GMMLE). 

Consider  the  form 


dP 


d{P  +  P) 


(Zn)  >  C 


dP 


d{P  -f  P) 


for  all  P  ^  V.  This  is  equivalent  to  saying 

for  all  P  ^  V.  In  particular. 


=  cd{zn,P,P)  <  1 


csup{d(z„,  P,P),P  eP}  <  1 

When  c  =  1,  this  is  the  generalized  maximum  likelihood  estimator. 


Generalized  Neighborhood  Maximum  Likelihood  Estimator 

The  concept  of  the  Generalized  Neighborhood  Maximum  Likelihood  Esti¬ 


mator  (GNMLE)  is  to  find  “maximum  likelihood  estimators",  and  choose  an 
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estimator  whose  distance  is  less  than  some  e  within  the  allowable  parameter 
space  or  family  of  distributions. 

In  using  the  Radon-Nikodym  derivative  as  a  generalized  density,  the  pro¬ 
cedure  becomes: 

1.  Define  a  distance  function  S{Pi,P2). 

2.  Choose  c  >  0. 

3.  Find  the  set  {P^}  of  functions  which  possibly  are  not  within  the  allow¬ 
able  space  V  which  satisfy 

sup{dizn,  P,  P"),  P  eV}  <1 

4.  Pick  a  Pm  corresponding  to  each  P^  within  V  such  that  6{Pm,Pm)  <  £• 
If  P*  is  the  function  found  by 

sup{d(z„,P,P*),F  6  P}  <  1 

then  the  generalized  neighborhood  maximum  likelihood  estimator  P  satisfies 

sup{r(z„,  P,  P-),  PeV,S{P,P)<t}=  sup{r(z„,  P,  P*),  P  €  V} 

Again,  the  discussion  about  the  existence  and  uniqueness  regarding  the  neigh¬ 
borhood  maximum  likelihood  also  applied  to  the  generalized  neighborhood 


maximum  likelihood  estimator. 
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4.2.4  Uniqueness  of  the  Maximum  Likelihood  Estima¬ 
tor 

A  question  was  raised  about  the  uniqueness  of  the  maximum  likelihood  esti¬ 
mate.  The  suggestion  that  for  general  sets  of  distributions  that  a  maximum 
likelihood  estimator  might  not  be  unique  is  pictorially  presented  in  figure  4.1. 
C.  R.  Rao  suggested  that  when  the  random  sample  constitutes  the  whole  sam¬ 
ple  space  that  the  maximum  likelihood  estimator  would  be  unique.  In  general, 
the  method  of  maximum  likelihood  does  not  produce  a  unique  estimator.  How¬ 
ever,  when  the  full  sample  space  is  included  in  the  formulation  of  the  likelihood 
function,  then  the  maximum  likelihood  estimator  is  unique  almost  everywhere. 

Counterexample  to  Uniqueness 

Hogg  and  Craig  (p.  207,  problem  6.3)  [109]  provides  a  counterexample.  Let 
3^1,  a:2,  •  •  • ,  be  a  random  sample  of  a  distribution  with  density  function 
f{x\9)  =  1  where  0  —  \  <  x  <  0  -f  1,  for  — oo  <  0  <  oo,  and  f{x\0)  =  0 
elsewhere.  Let  be  a  proper  subset  of  the  full  sample  space. 

Then  let  j/i  <  1/2  <“'<  Vn  be  the  order  statistic  from  this  random  sample. 
Then  every  statistic  u(a‘i,X2, ••• ,  Jn)  such  that 

Pn-  -  ,T„)  <  .Vl  d-  - 
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is  a  maximum  likelihood  estimator  of  6.  In  particular, 

(4yi  +2y„  +  l)/6 
(yi  +yn)/2 

and 

(2j/i  +4y„  -  l)/6 

are  three  such  statistics.  Thus,  uniqueness  is  not  in  general  a  property  of  a 
maximum  likelihood  estimator. 

When  the  Random  Sample  is  the  Full  Space 

Recall  from  the  Lebesgue-Radon-Nikodym  theorem  that  when  //  is  a  positive 
cr-finite  measure  on  a  <T-algebra  A4  in  a  set  X,  and  A  is  a  complex  measure  on 
A4,  then  there  is  a  unique  a.e.[fi]  function  h  E  L^{y)  such  that  Xa(E)  — 
for  every  set  E  E  M,  where  A  =  Aq  +  A,,  Aq  //,  and  A,  X  fi.  This  means 
that  if  two  functions  /i]  and  /12  satisfy  this,  then  they  differ  only  on  a  set  of 
^-measure  zero,  i.e.  n{x  :  hi  ^  ^2}  =  0. 

When  the  set  E  is  the  whole  sample  space,  then  Xa{E)  =  Aa(.V)  =  1  w'hen 
(X,  A4)  is  a  probability  space.  Thus  =  1.  When  p  is  taken  to  be 

Lebesgue  measure  of  the  appropriate  dimension  and  ,Y  is  Euclidean,  then  h  is 
our  probability  density  function  in  the  usual  sense  and  the  measure  is  often 


denoted  by  m. 
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If  h{x,  9)  is  a  parameterized  family  of  density  functions,  consider  the  collec¬ 
tion  of  all  9a  such  that  fx  h{x,9a)dm  —  1.  Then  h{x,9a^)  =  h{x,9a2)  a.e.[m\. 
In  the  general  case,  this  does  not  require  0a,  =  0a2-  To  assert  uniqueness,  more 
must  be  known  about  the  family  of  density  functions  under  consideration.  For 
example,  we  know  from  Bickel  and  Doksum  (p.  106,  theorem  3.3.2)  [40]  that 
the  exponential  family  given  by 

k 

fix- 9)  =  eMY.<^{0)Tiix)  +  d{9)  +  5’(:r)} 

t=i 

where  x  €  /1, 0  €  0,  with  C  denoting  the  interior  range  of  (ci(0),  •  •  •  ,Ck{9)) 
has  a  unique  maximum  likelihood  estimator  of  0  if  5{T',(x)}  =  T,(x)  for  i  = 

A  A  A 

1,  •  •  • ,  A;,  has  a  solution  0(x)  =  (0i(x),  •  •  • ,  9/c(x))  for  which 

(ci(0(x)),---,cjt(0(x))  G  C 

Thus,  if  we  sufficiently  restrict  the  allowable  set  of  functions,  we  can  achieve 
uniqueness,  but  uniqueness  is  not  automatically  a  property. 

Not  everything  tha';,  needs  to  be  recorded  has  been  recorded  here.  In  partic¬ 
ular,  some  thought  is  needed  with  respect  to  singular  distributions  and  what  it 
means  in  terms  of  allowable  sets  in  Ad  as  well  as  the  implications  for  choosing 
/i.  This  question  is  relevant  to  this  thesis  topic,  but  has  not  been  pursued. 


4.3  Specific  Techniques 
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Most  workers  dealing  with  order  estimation  assume  an  information-theoretic 
approach.  Techniques  based  on  this  approach  have  the  advantage  that  we 
know  how  to  do  the  computations  today  to  get  answers.  Some  very  nice 
analytical  surveys  of  techniques  have  appeared  from  time  to  time,  although 
they  are  being  developed  almost  as  fast  as  they  can  be  printed.  Because  these 
are  attractive  alternatives  to  the  work  in  my  thesis,  they  are  cataloged  here  for 
the  reader’s  use.  Some  of  these  had  their  birth  in  the  study  of  univariate  real 
time  series.  There  are  also  techniques  listed  here  that  use  approaches  other 
than  information-theoretic.  Many  of  the  below  techniques  have  been  discussed 
in  the  context  of  a  line  array  with  equally  spaced  elements,  using  the  spatial 
analog  to  sampling  a  stochastic  sequence  indexed  by  time. 

Pukkila  and  Krishnaiah  [211]  report  that  most  of  the  proposed  information- 
theoretic  order  determination  criteria  for  ARMA(p,  q)  models  can  be  expressed 
in  the  form  of  equation  4.2.  The  word  ARIMA  should  not  be  a  distractor.  That 
was  the  motivating  context  of  the  discussion  by  Pukkila  and  Krishnaiah.  If 
you  prefer,  let  ^  =  0  to  apply  this  to  an  autoregressive  problem  which  has 
a  spatial  analog  with  the  equally  spaced  line  array.  Even  more  basic  than 
that,  the  criteria  of  the  form  discussed  in  this  paper  are  derived  from  a  ba¬ 
sic  information-theoretic  approach.  The  number  (p  q)  is  merely  the  total 
number  of  parameters  in  the  model.  It  arises  as  the  degrees  of  freedom  of 
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a  distributed  random  variable  justified  by  a  large  sample  approximation 
used  to  satisfy  application  of  the  Central  Limit  Theorem  in  statistics.  Akaike 
[12]  discusses  application  of  this  statistic  to  factor  analysis,  principal  com¬ 
ponent  analysis,  analysis  of  variance,  and  multiple  regression,  in  addition  to 
autoregression  of  time  series  which  electrical  engineers  are  familiar  with. 

S(p,q)  =  n  log  &^  +  {p  +  q)g{n)  (4.2) 

You  recognize  that  is  the  maximum  likelihood  estimate  or  its  approximation 
for  the  residual  variance  cr^.  The  term  (p  -1-  q)g{n)  is  a  nonnegative  penalty 
term  which  increases  as  the  number  of  parameters  increases.  It  is  noted  that 
the  term  nlogd"^  tends  to  decrease  as  the  number  of  parameters  increases. 
The  function  g(n)  produces  other  criteria  which  you  may  recognize.  When 
g(n)  =  2,  we  get  the  A/C(p,q)  criterion.  The  BIC{p,q)  criterion  is  obtained 
by  selecting  g{n)  =  logn.  The  HQ  criterion  is  obtained  by  g{n)  =  clog  log  n 
where  c  is  a  specified  constant.  EDC  is  obtained  by  9  =  0  and  g{n)  =  7(71) 
where  7(n)  is  a  sequence  of  positive  numbers  such  that 

lim„^oo  l{n)ln  =  0,  lim„^oo  7(«)  =  00 
A  variation  on  this  is  obtained  by  selecting  7(72)  such  that 

lim„^oo7(«)/”  =  0.  hm„_oo7(«)/(loglog”)  =  00 

They  call  attention  to  a  survey  of  different  univariate  order  determination  by 
de  Grooijer,  Abraham,  Gould,  and  Robinson  [69]. 


67 

4.3.1  Akaike  Information  Criterion  (AIC) 

The  definition  given  by  Akaike  in  his  December  1974  paper  [16]  is  given  in 
equation  4.3. 

AIC  =  — 2max{log(jp(i/)}  +  2(d/)  (4.3) 

The  term  jpdf  is  the  joint  probability  density  function,  where  you  choose  the 
model  yielding  the  minimum  AIC.  It  is  derived  using  the  Kullback-Leibler 
mean  information  measure  [155]  (pp.  26-27).  The  requirements  and  assump¬ 
tions  are:  (1)  the  distribution  must  be  a  regular  member  of  the  exponential 
class  in  the  sense  of  Hogg  and  Craig  (p.357-358)[109],  (2)  large  sample  case  (see 
p.  718,  left,  bottom[16]),  (3)  the  third  and  higher  order  terms  of  a  Taylor  series 
expansion  are  dropped  (see  p.  718,  right,  middle[16]),  (4)  /  /'(x,  9o)dx  =  0  and 
f"{x,0o)dx  =  0  (see  Kullback,  p.27,  item3[155]),  and  (5)  AIC  is  computed  for 
each  model  considered.  Parzen  [203]  defines  AIC  as  the  value  of  m  minimizing 

TTl 

AIC{m)  =  logd^  -I-  2— 

where  is  the  estimator  of  the  mean-square  prediction  error  <t^,  and  T  is 
the  total  number  of  samples.  Cremona  and  Brandon  [63]  give  the  following 
expressions  for  AIC  ; 

AIC{M)  =  N\nV!si  +  2p 

or 


Wn  =  N  IuVn  +  V’(^’P) 
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where  (p{N,p)  =  2p.  The  term  Vn  is  the  minimum  of  some  loss  function 

(0,  A)  (quadratic  error  criterion  or  likelihood  criterion)  and  M  is  the  model 
order  and  N  is  the  number  of  samples.  The  quantity  p  is  the  size  of  the 
observation  vector. 

Note  that  we  get  AIC  by  selecting  g{n)  =  2  in  equation  4.2.  Pukkila  and 
Krishnaiah  [211]  report  that  Shibata  [241]  proved  that  AIC  is  not  a  statisti¬ 
cally  consistent  estimator  for  the  order  of  a  univariate  autoregressive  model. 
Instead,  the  AIC  criterion  tends  to  overestimate  the  order  of  an  AR(p)  model. 
Similarly,  AIC  does  not  produce  a  consistent  estimate  for  the  order  of  an 
ARMA(p,q)  model  [101].  Although  Rissanen  [226]  regards  consistency  an 
generally  necessary  property  for  any  criterion,  he  notes  that  consistency  does 
not  in  itself  guarantee  good  estimation  results  for  small  samples. 

Kashyap  [129]  was  one  of  the  first  to  bring  serious  challenge  to  AIC.  He 
showed  that  AIC  was  not  statistically  consistent.  For  the  AIC  rule,  the  prob¬ 
ability  of  error  is  not  less  than  0.156  even  when  n  tends  to  infinity.  Kashyap 
recommends  that  attention  be  restricted  to  the  class  of  consistent  decision 
rules.  He  proposes  a  consistent  decision  function. 

In  1983,  Wax  and  Kailath  [278]  proposed  an  alternative  for  the  number 
of  free  adjusted  parameters  within  a  model  to  be  k{2p  —  t)  4-  1  where  k  is 
the  test  order  and  p  is  the  size  of  the  observation  vector  which  is  sampled  N 
times.  These  vectors  are  assumed  to  be  independent  and  identically  distributed 
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according  to  the  real  multivariate  normal  distribution  Np{0,R).  With  this 
adjustment,  AIC  is  modified  as  shown  in  equation  4.4.  They  observed  that 
AIC  yields  an  inconsistent  estimate  that  tends,  in  the  large-sample  limit,  to 
overestimate  the  true  rank. 


AIC{k)  =  -2  log 


N 


-t-  2k{2p  —  k) 


(4.4) 


A  few  words  about  consistency  are  in  order  at  this  point  because  of  the 
wide-spread  criticism  of  Akaike’s  work.  “Consistency”  in  statistics  is  a  techni¬ 
cal  term.  For  an  estimator  that  depends  on  the  sample  size  n,  then  it  is  called 
consistent  if  its  expected  value  is  unbiased  when  n  tends  to  infinity.  Cochran 
(pp.  2 1-22) [55]  has  the  following  to  say  about  estimators  and  consistency. 


The  precision  of  any  estimate  made  from  a  sample  depends 
both  on  the  method  by  which  the  estimate  is  calculated  from  the 
sample  data  and  on  the  plan  of  sampling.  . . .  When  studying  any 
formula  that  is  presented,  the  reader  should  make  sure  that  he  or 
she  knows  the  specific  method  of  estimation  for  which  the  formula 
has  been  established.  •  •  •  [In  the  context  of  sampling  theory,]  a 
method  of  estimation  is  called  consistent  if  the  estimate  becomes 
exactly  equal  to  the  population  value  when  n  =  N,  that  is,  when 
the  sample  consists  of  the  whole  population.  •  ■  •  Consistency  is  a 
desirable  property  of  estimators.  On  the  other  hand,  an  inconsis- 
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tent  estimator  is  not  necessarily  useless,  since  it  may  give  satis¬ 
factory  precision  when  n  is  small  compared  to  N.  •  •  •  In  classical 
statistics,  an  estimator  is  called  consistent  if  the  probability  that 
it  is  in  error  by  more  than  any  given  amount  tends  to  zero  as  the 
sample  becomes  large. 

Bickel  and  Doksum  (pp.  134,  141,  225)[40]  concur  with  this  remark,  and 
have  the  following  to  say  about  various  kinds  of  estimators.  The  notions  of 
consistency,  asymptotic  mean,  variance,  and  unbiasedness  are  the  properties 
of  the  sequence  of  the  estimates  {T„(a;i,  •  •  • ,  x„)}  for  n  >  1,  not  of  any  single 
Tn.  These  are  properties  of  the  method  of  maximum  likelihood,  not  of  the 
maximum  likelihood  estimate  for  a  particular  sample  size.  . . .  Small  sample 
studies  comparing  the  behavior  of  uniformly  minimum  variance  unbiased  esti¬ 
mators  (UMVU)  and  MLEs  are  inconclusive.  Simple  examples  in  which  there 
are  many  nuisance  parameters  are  known  for  which  MLEs  behave  very  badly 
even  for  large  samples.  Neither  MLEs  nor  UMVU  estimates  are  satisfactuiy 
in  general  if  one  takes  a  Bayesian  or  minimax  point  of  view.  Nor  are  they 
necessarily  robust.  . . .  Likelihood  ratio  tests  are  based  on  heuristic  grounds. 

On  this  basis,  there  is  insufficient  evidence  to  discredit  Akaike’s  work.  We 
still  have  work  to  do  for  the  small  sample  case. 


4.3.2  Bayesian  Information  Criterion  (BIC) 

Pukkila  and  Krishnaiah  [211]  credit  Schwarz  [239]  and  Rissanen  [224]  for  in¬ 
dependently  developing  BIC  starting  from  different  points.  BIC  is  defined  in 
equation  4.5.  This  equation  is  obtained  by  letting  g{n)  =  logn  in  equation  4.2. 
BIC  produces  a  consistent  estimate  (p,g)  for  the  order  of  an  ARMA  model. 

BIC{'p,q)  =  n\oga^ -\-{p-\-q)\ogn  (4.5) 

4.3.3  Kashyap  Information  Criterion  (KIC) 

This  is  a  variant  of  AIC.  This  discussion  is  based  on  [129].  Let  the  estimate 
of  the  unknown  order  mo  based  on  T/v  be  given  by 

m*  =  arg  min  dm(y'A/)] 

where 

dm(T/v)  =  iVlnp;+m/(7V) 


The  quantity  is  the  residual  variance  for  the  fitted  autoregressive  model 
having  m  lag  terms.  This  term  can  be  recursively  computed  from  Vyv-  See 
references  [71][169].  Deterministic  function  f{N)  satisfies  f{N)  >  0,  f{N)  — » 


4.3.4  Hannan-Quinn  (HQ) 
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Pukkila  and  Krishnaiah  [211]  cite  Hannan  and  Quinn  [101]  as  the  source  for 
the  HQ  criterion  given  in  equation  4.6  where  c  is  a  constant  to  be  specified. 
This  equation  is  obtained  by  letting 

g{n)  =  clog  log  n 

in  equation  4.2.  Select  a  constant  c  >  2  to  guarantee  a  strongly  consistent 
order  estimate. 


HQ{p,q)  -  nlog^^  +  (p  +  q)cloglogn  (4.6) 

4.3.5  Efficient  Detection  Criterion  (EDC) 

Zhao,  Krishnaiah,  and  Bai  [297]  proposed  the  procedure  for  the  white  noise 
case  now  known  as  the  Efficient  Detection  Criterion  {EDC).  Efficiency  is  a 
technical  term  in  statistics.  An  estimator  is  called  efficient  if  the  Cramer- 
Rao  lower  bound  is  achieved.  Zhao,  Krishnaiah,  and  Bai  [298]  extended  that 
work  to  the  colored  noise  case  for  the  signals  and  noise  having  independent 
real  Wishart  covariance  matrices.  They  considered  the  asymptotic  case.  Bai, 
Krishnaiah,  and  Zhao  [36]  define  EDC  as  follows.  Let  x{i)  =  As{t)  +  n{t) 
where  the  column  signal  vector  s{t)  and  the  column  noise  vector  n{t)  are  com¬ 
plex  random  vectors  distributed  independently  with  mean  0.  Let  the  matrix 
X  =  [a:(<i),  •  •  • ,  a:(/„)]  be  the  sample  of  size  n  of  the  proces.s  x{t).  The  rovari- 
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ance  of  s{t)  is  given  by ’t,  and  the  covariance  of  Ti{t)  is  given  by  cr^Ip  where 
/p  is  a  p  X  p  identity  matrix.  The  matrix  A  =  [A{(f)i),  •  •  • ,  A{<f>g)]  is  a  complex 
vector  of  unknown  parameters  associated  with  the  signal.  The  number  of 
unknown  parameters  for  each  signal  is  assumed  known.  Let  the  eigenvalues  of 
E  be  Aj  >  •  •  •  >  Ap.  Let  5„  be  the  maximum  likelihood  estimator  of  S  where 
nSn  =  XX^ ,  and  let  the  sample  eigenvalues  of  5„  be  given  by  /j,  -  •  • ,  1^.  Let 
Hq  be  the  hypothesis  that  the  number  of  signals  is  equal  to  q.  Thus 


When  <7^  is  unknown  and  {x(tt)}i  ^■*’6  independently  distributed  as  complex 
normal,  the  logarithm  of  the  likelihood  ratio  test  statistic  for  Hq  is  given  by 

m  =  n  {  (  lo* '?)  -  (P  -  ,)  log  E _  1?)  ) 

Then  EDC  is  given  by  equation  4.7. 


EDC{k,C{ri))  =  ~2L{k)  -|-  i/{k,p)C{n)  (4.7) 


In  equation  4.7 

u{k,p)  =  k{2p  —  A:  +  1)  +  1 

is  the  number  of  free  parameters  when  Hk  is  true.  Then  the  estimate  9  of  9  is 
the  value  of  q  that  satisfies  equation  4.8 


EDC{q,  C{n))  =  mm{EDC{0,  C(n)),  •  •  • ,  EDC{p  -  1,  C(n))}  (4.8) 
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The  quantity  C{n)  is  chosen  so  that  it  satisfies  the  following  conditions:  1) 
lim[C(n)/n]  =  0  and  2)  lir^ [C(n)/ log  log  n]  =  oo.  When  cr^  is  known  then  it 
can  be  assumed  to  be  unity  without  loss  of  generality.  Then  EDC*  is  given 
by  equation  4.9. 

EDC*iq,  C{n))  =  mia{EDC*{0,  C(n)),  •  •  • ,  EDC*{p  -  1,  C(n))}  (4.9) 

In  equation  4.9,  the  individual  entries  over  which  the  minimum  is  taken  is 
given  by  equation  4.10. 

EDC*{k,C{n))  =  +  u^{k,p)C{n),  (4.10) 

In  computing  the  term  L*,  r  is  the  number  of  sample  eigenvalues  U  greater 
than  unity  where 


L*{k)  =  n  ^  {\oglf  +  l-l^) 

»=l+inin(T,Ar) 

The  1989  paper  [36]  gives  bounds  under  certain  conditions  on  the  probability 
of  a  wrong  decision.  In  this  paper,  Bai  et  al.  point  out  that  the  estimator  is  a 
statistically  consistent  estimator,  the  rate  of  convergence  of  the  estimate  of  the 
number  of  signals  to  the  true  value  is  rapid,  and  no  threshold  value  is  required 
to  form  the  estimator.  This  paper  is  a  good  entry  point  into  the  literature  on 


information  theoretic  approaches. 


4.3.6  White  Noise  Tests  (Ti,  TAIC,  TBIC,  THQ) 
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Pukkila  and  Krishnaiah  [211]  consider  the  order  determination  problem  for 
real- valued  autoregressive  (AR)  models  and  using  concepts  motivated  by  Box 
and  Jenkins  [42].  Testing  the  adequacy  of  a  fitted  model  is  based  on  the 
estimated  autocorrelation  structure  of  the  residual  series  from  the  estimated 
model.  Starting  from  a  simple,  parsimoniously  parameterized  model,  a  model 
builder  adds  new  parameters  until  the  residual  series  is  close  enough  to  a 
white  noise.  Pukkila  and  Krishnaiah  accomplish  this  by  creating  a  family  of 
test  statistics  built  from  the  forms  of  AIC,  BIC ,  and  HQ. 

For  the  autoregressive  (AR)  model,  equation  4.2  is  minimized  for  p  = 
0, 1,  •  •  •  ,p''  where  p*  is  the  largest  model  order  we  are  willing  to  consider,  and 
the  quantity  g  =  0  is  used  to  restrict  the  case  to  the  AR  model.  They  use  the 
Hannan  and  Quinn  estimator  for  the  AR  model  residual  variance  given  by 


where  are  the  Yule- Walker  estimates  of  the  autoregressive  coeffi¬ 

cients  {<t>k}\  and  {r(A;)}i  are  the  autocorrelations.  The  autocorrelations  are 
computed  by  r{k)  =  c(A:)/c(0)  where 

<^{^)  =  -  -  x) 

”  t=i 

for  \  <  k  <  p.  The  are  obtained  by  solving  the  Yule- Walker  equation 
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4.11. 


r(0)  r(l) 

•  •  •  r{k  —  1) 

’•(1) 

r(l)  r(0) 

r(it  -  2) 

<j>2 

= 

r(2) 

r{k  —  1)  r{k  —  2) 

• • •  r(0) 

r{k) 

The  test  statistic  is  then  given  by  equation  4.12. 

Ti{p*)  =  |o,nlog  “  IZ  +PP(«)|  (4.12) 

If  Ti{p*)  <  0  then  reject  Hq  :  {xt  is  generated  by  an  AR(0)  process}  in  favor  of 
Hi  :  {xt  is  generated  by  an  AR(/i:)  process  where  /:  >  0}.  To  use  a  traditional 
order  estimation  criterion  for  6{p)  substitute  the  corresponding  expression  for 
g{n).  Thus,  to  obtain  TAIC{p*)  corresponding  to  AIC,  select  g{n)  =  2. 
Similarly,  choose  g{n)  =  logn  for  TBIC{p*)  and  choose 

g{n)  =  clog  logn 

to  get  THQ{p*).  Pukkila  and  Krishnaiah  also  provide  the  asymptotic  values 
of  the  significance  levels  a{n)  and  lower  bounds  for  the  power  functions  for 
these  proposed  tests. 


4.3.7  Minimum  Description  Length  (MDL) 


The  comments  regarding  MDL  are  based  on  [224]  [225]  [226]  [81]. 


(n,0)  =  argmin 


-logp(x;^)  +  -nlog.'V 


=  argmin  Ie{x) 

T\,yB 
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The  number  of  parameters  in  parameter  vector  0  is  n.  is  the  length  of  the 
observation  sequence.  I${x)  is  the  information  of  the  sequence  x  with  respect 
to  the  given  probability  distribution  family.  Feder’s  article  is  very  readable. 

The  reader  is  encouraged  to  consult  reference  [225]  which  generalizes  MDL 
so  that  it  is  invariant  with  respect  to  all  linear  coordinate  transformations. 
Rissanen  does  not  compromise  on  technical  quality,  his  writing  is  very  clear, 
and  he  accompanies  his  developments  with  insightful  remarks. 

Define 

log  *{y)  =  \ogy  +  log  log  y  +  ... 

and 

II^IUw  =  /<  e,  Mie)9  > 

where  <  •,  •  >  is  the  inner  product  of  the  A:-component  parameter  vector  9 
and  the  product  of  the  information  matrix  M{9)  =  n  x  Jg  times  9.  The  value 
of  k  is  the  model  order.  The  information  matrix  Jg  is  defined  by 

Then  the  MDL  criterion  is  given  by  equation  4.13. 

-  log  P{y,  9)  =  -  log  P[y  ]  9)  +  log  *  (1|0|1m(«»))  (4.13) 

Rissanen  calls  —  log  P{y,  0)  the  joint  ideal  code  length  which  is  to  be  minimized 
as  a  function  of  model  order  k. 
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As  discussed  earlier,  in  1983  Wax  and  Kailath  [278]  proposed  an  alternative 

for  the  number  of  free  adjusted  parameters  within  a  model  to  be  k(2p  —  Ar)  +  1 

where  k  is  the  test  order  and  p  is  the  size  of  the  observation  vector  which  is 

sampled  N  times.  These  vectors  are  assumed  to  be  independent  and  identically 

distributed  according  to  the  real  multivariate  normal  distribution  Ap(0, 72). 

With  this  adjustment,  MDL  is  modified  as  shown  in  equation  4.14.  They 

observed  that  MDL  is  a  consistent  estimator  of  true  system  rank. 

N 

+  ^k{2p  -  k)\ogN  (4.14) 

In  1985,  Wax  and  Kailath  [280]  apply  MDL  to  the  problem  of  estimating 
the  number  of  signals  in  a  multi-channel  time  series.  In  this  paper,  they 
generalize  earlier  proofs  that  MDL  is  a  consistent  estimator. 


4.3.8  Wang’s  Sphericity  Test 


Wang’s  Sphericity  Test  is  my  name  for  the  test  Wang  and  Kaveh  proposed 
in  their  1986  paper  (equations  6  through  8)[275].  Let  R  =  E{StS^}  be  the 
covariance  matrix  of  signals.  Let  W  =  ARA^  +  a^I  be  estimated  by  W  with 
sample  eigenvalues  {/j,  •  •  • ,  /^)  where  If  >  lf_^^.  The  estimate  d  of  the  number 
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The  quantity  p{d,  k)  is  a  chosen  penalty  function  for  the  overdetermination  of 


you  have  MDL.  Using  these,  Wang  and  Kaveh  examined  the  probabilities  of 
underestimating  and  overestimating  the  number  of  sources  for  the  cases  of  up 
to  two  closely  spaced  sources  in  spatially  white  noise.  Wang  and  Kaveh  applied 
their  findings  to  narrowband  systems.  In  a  1987  paper  [276],  they  continued 
their  study  and  applied  it  to  wide  band  systems.  In  both  these  papers  they 
studied  the  asymptotic  case. 


4.3.9  Finite  Markov  Chain  Mfucimum  Entropy  Order 
Estimator  (FMCME) 

This  reviews  work  reported  primarily  in  [175].  Consider  a  discrete-time 
order  Markov  process  where  each  random  variable  x,  takes  on  values  in  a  finite 
set  A.  A  A:'*-order  Markov  process  is  one  where  the  probability  of  the  occur¬ 
rence  of  X,-  depends  on  the  preceding  k  samples  {xi_i,x,_2,  •  •  •  but  not 

on  the  preceding  k  I  samples.  The  goal  is  to  estimate  k  as  accurately  as 
possible.  To  measure  accuracy,  the  following  performance  criterion  is  used. 
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Among  all  estimators  k  for  which  the  overestimation  probability  Pk{k  >  k) 
decays  faster  than  2“'^"  (for  a  specified  A  >  0)  uniformly  for  any  Markovian 
probability  measure  Pk  of  order  k,  find  an  estimator  that  minimizes  the  un¬ 
derestimation  probability  Pk{k  <  k)  uniformly  for  every  Pk- 

Let  X  =  (a:i,  •  •  ■ ,  ^n)  €  A"  be  an  observed  sequence  from  the  unknown 
Ar'^-order  Markov  process.  Let  s,  at  time  i  specify  the  state  of  the  Markov 
source  that  governs  when  sample  x,  is  drawn.  Thus 

Si  =  •  •  • , Xi^k)  ^  A 

Let  u  be  an  arbitrary  member  of  A,  and  let  s  be  an  arbitrary  member  of  A*. 
Define  the  delta  function 

where  6  is  one  when  the  arguments  are  equal,  and  6  is  zero  otherwise.  Let  fco 
be  a  finite  integer  which  is  an  upper  bound  for  the  true  order  k.  There  are 
two  versions  of  FMCME.  Version  k*  applies  when  the  value  of  k^  is  known. 
Version  k**  applies  when  fco  is  unknown. 

Version  k*  is  the  estimated  order  you  seek. 

r  =  min{j  :  H{qi)  -  <  A} 


where 
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H  ■*) 

n  ^ 


«=i 


ueA 


9*(w  I  ^)  = 


?x(«5«)/9x(«),  9x('S)>0 


0, 


9r(«)  =  0 


Using  Rissanen’s  MDL,  the  estimator  k*  is  asymptotically  equivalent  to 


min  (i;  -MDLij)  -  -MDL{ko)  <  x] 

Version  k**  applies  when  ko  is  unknown.  It  is  based  on  the  LZ  data  com¬ 
pression  algorithm  described  in  reference  [300].  The  LZ  code  word  length  of 
X  is  Ulz  which  is  computed  by  the  algorithm.  The  unknown  term  H{q^)  in 
the  expression  for  k*  is  approximated  by  the  normalized  LZ  code  word  length 
function. 

k-  min  { j  :  H{qi)  -  <  a} 

By  applying  the  theory  of  large  deviations,  the  estimator  k*  has  been  ex¬ 


tended  in  reference  [176]  to  exponential  families.  This  is  applicable  to  the 
Gaussian  linear  regression  model  and  the  autoregressive  (AR)  model. 
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4.3.10  Coherent  MDL 


This  approach,  reported  in  [281],  yielded  two  test  statistics.  The  first  is  most 
suitable  for  the  detection-only  problem.  The  second  is  for  the  joint  detection 
and  estimation  problem.  In  this  approach  the  signal  is  considered  as  unknown 
constants  without  an  assumed  stochastic  model.  The  motivation  for  this  ap¬ 
proach  is  that  previous  approaches  were  not  applicable  to  the  case  of  a  fully 
correlated  signal,  such  as  occurs  with  a  specular  multipath  situation.  Both 
approaches  were  proven  to  produce  statistically  consistent  estimators.  For  the 
detection  problem,  the  MDL  estimator  for  the  number  of  sources  is  given  in 
equation  4.16. 


kMDLB  =  arg  min  MDLB{k) 
fce{o,-,p-i} 


(4.16) 


where 

^  p-k  .  ^ 

-!r  E 

MDLB{k)  =  M{p-k)\og  +\H‘^P-k  +  l)\ogM 

\  [i-l  / 

(4.17) 


with  9^^^  given  by 


ijii,  I* '?(«<*>) 

=  arg  min  log  - - rjr — . . 


/  P-k 

( n  '.(«<*') 


(4.18) 


The  combined  detection-estimation  estimator  of  the  number  of  sources  is  given 
in  equation  4.19. 


kMDLC  =  arg  min  MDLCik) 

fc€{0,-,p-l} 


(4.19) 


where 


MDLC{k)  =  M{p-k)\og 


5^ 


(n‘ 


\  i/(p-*) 


+-A:(2p— A:+l)  log  Af  (4.20) 


with  9^^^  given  by 


p-k 

=  argmin5]/^(0<*=)) 


(4.21) 


4.3.11  Maximum  Likelihood  (ML) 

This  is  information  taken  from  [81].  Assume  that  the  desired  probability 
distribution  p(«)  belongs  to  a  parameterized  distribution  family  Pq  indexed 
by  parameter  vector  0  €  0.  Then  the  maximum  likelihood  criterion  will  choose 
p  from  Pq  by 


p  =  arg  max  log  p(a;)  or  ^  =  arg  max  log  p{x\ 9) 
p€Pe  ^£0 


4  3.12  Maximum  Entropy  (ME) 

Let  the  desired  probability  distribution  p(*)  belong  to  a  set  of  distributions  P 
where 

P  =  {p(a:)  I  Ep[g{x)]  =  g) 

such  that  g  is  known.  The  given  averages  are  the  only  information  available. 
Then  the  chosen  distribution  is  p  where 

p  =  argmax/f(p)  =  arg  max  —  /  p(x)  logp(x)dx 
p^P  peP  L  Jx 
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The  reader  is  strongly  recommended  to  read  [81]  for  the  remarkably  clear 
presentation. 

Miller  and  Snyder  [182]  remark  that  the  probability  density  maximizing 
entropy  is  identical  to  the  conditional  density  of  the  complete  data  given  the 
incomplete  data.  This  equivalence  comes  from  viewing  the  measurements  as 
specifying  the  domain  over  which  the  density  is  defined.  The  identity  between 
the  maximum  entropy  and  the  conditional  density  comes  from  the  fact  that 
the  maximum-likelihood  estimates  may  be  obtained  via  a  joint  maximization 
(minimization)  of  the  entropy  function  (Kullback-Liebler  divergence). 


4.3.13  Criterion  Autoregressive  Transfer  Function 

The  Criterion  Autoregressive  Transfer  (CAT)  function  approach  is  reported  in 
[203]. 

CAT{m)  =1-^  +  ^ 

O't.  I 

m 

where  is  the  estimator  for  the  mean-square  prediction  error  cr^  of  an  infinite 
order  autoregressive  model  AR(oo).  The  quantity  in  the  denominator  is 
defined  as  =  j^d^,  which  is  the  unbiased  estimator  for  cr^.  The  value 
m  minimizing  C AT[m)  is  chosen  not  as  the  order  of  an  autoregressive  model 
chosen  to  fit  the  observed  time  series,  but  as  the  order  of  an  autoregressive 
estimator  of  the  infinite  order  autoregressive  transfer  function  (ARTF)  goo{*)- 
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4.3.14  Final  Prediction  Error  Criterion  (FPE) 

This  is  taken  from  [63]. 


FPE{M)  = 

Rissanen  [226]  credits  Davisson  [66]  with  the  first  statement  of  “final  pre¬ 
diction  error.”  Soderstrom  [250]  credits  Akaike  with  proposing  FPE  in  1969 
in  reference  [3]. 

4.3.15  Weak  Parameter  Criterion  (WPC) 

Broersen’s  1985  paper  [46]  suggests  that  weak  parameters  should  be  removed 
if  the  squares  of  their  estimates  are  less  than  twice  the  expectation  for  a 
white  noise  signal.  The  measure  2  for  significance  is  derived  from  asymptotic 
conditions.  WPC  is  based  on  the  same  principles  as  Mallows’  Cp,  FPE,  and 

AIC.  Choose  the  value  of  M  as  model  order  which  minimizes  WPC{M). 

(M 

11(1 -2u,) 

j=o 

In  this  expression,  vo  =  0.  When  Yule- Walker  estimates  are  used  for  model 
reflection  coefficients  then 

Vj  =  {N  -j)l[N{N+2)\ 


When  Burg  estimates  are  used  for  model  reflection  coefficients  then  Vj  = 
\I{N  —  j  -f- 1).  The  quantity  S\f  is  the  residual  reduction  by  adding  reflection 
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<T^.  The  notations  SSEp  and  MSE  are  common  in  the  regression  and  linear 
models  statistical  literature. 


4.3.18  Bayesian  Quickest  Decision 

This  approach  includes  a  penalty  on  the  clzissical  Bayes  wrong  decision  cost 
function  for  delays  in  detecting  a  signal.  The  minimization  of  the  average  risk 
function  leads  to  the  optimum  decision  regions.  A  more  detailed  description 
of  this  approach  would  essentially  repeat  the  original  paper,  so  the  interested 
reader  is  referred  to  original  works  by  Bouvet  [41].  This  paper  should  be  read 
together  with  Peikowitz  and  Schwartz’  1987  paper  [206]. 

4.3.19  Quickest  Detection  Sample  Size 

This  method  was  proposed  in  [206].  The  goal  is  to  find  the  sample  size  M  that 
minimizes  the  mean  time  to  detection  Mj)  for  detecting  a  sudden  change  in  the 
statistics  of  an  observed  process  for  a  given  mean  time  between  false  alarms 
Mp  =  (False  Alarm  Rate)"'.  Let  13  be  the  single-sample  signal-to-noise  ratio. 
This  paper  shows  that  for  small  ^  and  large  Mp  that  the  optimum  sample 
size  M  and  the  system  performance  depends  on  Mp  and  ^  only  through  the 
product  fi\/Mp.  Graphs  are  provided  in  the  paper  for  choosing  parameters 


for  the  solution. 
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Let 

^  Md  _  mean  number  of  samples  to  detection 

Mp  mean  number  of  samples  between  false  alarms 

and  let  A  be  the  detection  threshold  which  is  a  function  of  the  given  test 
statistic,  the  data  sample  size  A/,  and  probability  of  false  alarn  a.  Call  R  the 
average  sample  size  ratio.  Let  the  stationary  noise  process  have  mean  /xq  and 
variance  <Tq.  Let  F{x)  be  the  cumulative  distribution  function  of  the  received 
random  process,  and  let  $(a:)  =  1  —  F(x).  Let  ro(m)  be  the  normalized 
autocorrelation  function  of  the  stationary  noise  defined  by 


ro(m)  =  {[xo(n)  -  no]  [a:o(n  +  m)-  /^o]}  =  M-m) 


and  let 

L  L 

10=  =  1+  2  ^  ro(m) 

m=—L  m=\ 

Under  the  conditions  that  signal  strength  cr  — »  0,  signal-to-noise  ratio 
^[p)  — >  0,  mean  number  of  samples  between  false  alarms  Mp  oo,  and 
I3{p)y/Mp  is  some  fixed  constant  then  the  limiting  values  of  the  average 
sample  size  ratio  R  and  detection  threshold  A  are  given  by 


/t(a;^A/v/a)  1 

$  [$“*{a)  —  J 


and 


A(q;  Mp)  =  aMppo  +  ^~^{a)(Toy/looiMp 
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4.3.20  Likelihood  Ratio  Test 


Soderstrom  [250]  gives  the  likelihood  ratio  test  statistic  as  equation  4.22  for 
testing  between  model  Mi  and  model  M2-  Model  Mi  is  chosen  if  A  is  close  to 


1. 

sup  L{0,\) 

X  _  _ 

sup  L{0,  A) 


Wilks  [289]  was  the  first  to  propose  this  statistic  in  1938. 


(4.22) 


4.3.21  Guttman  Lower  Bound  Criterion 

This  criterion,  discussed  in  [111][99],  recommends  retaining  all  of  the  principal 
components  that  contribute  more  total  variance  than  does  the  typical  nor¬ 
malized  time  series.  Richman  [222]  notes  that  it  is  safer  to  choose  too  many 
components  than  to  choose  fewer  components  than  are  suggested  by  such  cri¬ 
teria.  (Note  that  in  the  adaptive  filtering  context,  we  know  that  choosing  a 
model  order  that  is  too  large  can  lead  to  an  unstable  filter.) 


4.3.22  Other  Significance  Tests 

Horel  [111]  cites  other  significance  tests  which  have  escaped  the  electrical  engi¬ 
neering  literature.  These  tests  are  given  in  [37][201][194][191].  Testing  of  com¬ 
plex  principal  components  is  a  part  of  geophysical  data  analysis.  Soderstrom 
[250]  discusses  the  use  of  the  F-test  for  comparing  two  models. 


4.4  Comparisons  and  Evaluations 


Hipel  [107]  reported  on  the  use  of  AIC  in  the  context  of  geophysical  time  series. 
This  is  a  broad  ranging  paper  with  an  extensive  bibliography.  Hipel  states  the 
AIC  formula,  discusses  its  use  in  ARMA  models  for  order  determination, 
discusses  model  construction,  alternatives  to  AIC,  and  some  disadvantages  of 
AIC.  Alternatives  include  the  maximum  method,  Parzen’s  CAT,  Gray’s 
/^-statistic.  Mallows’  Cp  statistic,  and  Sawa’s  BIC  statistic.  He  also  discusses 
Akaike’s  MAICE  and  final  prediction  error  (FPE)  technique. 

Some  disadvantages  of  the  AIC  and  the  other  automatic  selection  criteria 
are  that  an  overall  statistic  tends  to  cover  up  much  of  the  information  in  the 
data  and  the  practitioner  may  lose  his  sense  of  feeling  for  the  inherent  char¬ 
acteristics  of  the  time  series  if  he  bases  his  decisions  solely  upon  one  statistic. 
However,  when  MAICE  is  used  in  conjunction  with  the  three  stages  of  model 
construction,  there  is  no  doubt  that  MAICE  greatly  improves  the  modeling 
process. 

Soderstrom  [250]  observed  that  AIC  and  FPE  are  asymptotically  equiva¬ 
lent  to  an  i^-test.  Kundu  [158]  compared  simulation  results  of  several  information- 
theoretic  criterion  [AIC,  MDL,  and  EDC)  and  Cross  Validation.  He  observed 
that  AIC  and  Cross  Validation  perform  quite  well  for  small  samples  and  large 
error  standard  deviation,  and  noted  that  the  small  sample  properties  of  MDL 
and  EDC  have  not  been  investigated  fully.  When  the  radian  frequency  of  two 


signals  are  close,  then  the  Cross  Validation  approach  performs  better  than  any 
other  method. 

Wang  and  Kaveh  [275]  compared  the  asymptotic  performance  of  AIC  and 
MDL  as  part  of  a  study  of  a  generalized  information  theoretic  order  deter¬ 
mination  method  that  subsumes  those  two  methods  as  applied  to  the  case  of 
an  array  of  M  sensors.  They  concluded  for  cases  of  up  to  two  closely  spaced 
sources  in  spatially  white  noise,  that  Rissanen’s  MDL  penalty  function  was 
shown  to  result  in  a  larger  probability  of  underestimating  but  smaller  proba¬ 
bility  of  overestimating  the  number  of  sources  in  comparison  to  Akaike’s  AIC 
penalty  function. 

Zhang,  Wong,  Yip,  and  Reilly  [296]  did  a  statistical  theory  and  simulation 
comparison  of  AIC  and  MDL.  They  concluded  that  AIC  is  more  efficient 
in  reducing  the  probability  of  missing  a  detection  than  the  MDL  criterion. 
On  the  other  hand,  for  a  moderate  number  of  snapshots,  the  probability  of 
false  alarm  using  the  MDL  criterion  approaches  zero  whereas  that  for  the 
AIC  remains  constant.  The  MDL  criterion  is  more  efficient  in  reducing  the 
probability  of  false  alarm  than  the  AIC.  The  choice  of  the  penalty  term  by 
AIC  emphasizes  better  performance  under  relatively  lower  SNR  or  smaller 
number  of  snapshots  (or  both)  at  the  expense  of  being  inconsistent.  The 
penalty  term  adopted  by  MDL  emphasizes  the  performance  when  the  number 
of  snapshots  is  large,  sacrificing  the  performance  at  relatively  lower  SNR  or 
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smaller  number  of  snapshots  (or  both).  They  cite  Chen,  Reilly,  and  Wong  [54] 
to  remark  that  the  penalty  function  can  be  adjusted  to  obtain  a  criterion  whose 
performance  best  satisfies  the  chosen  goal.  The  choice  depends  on  the  number 
of  snapshots  and  the  signal-to-noise  ratio.  Under  low  SNR,  both  AIC  and 
MDL  necessitate  a  large  number  of  snapshots.  The  authors  show  in  another 
paper  that  the  performance  of  both  criteria  can  be  improved  by  choosing  a 
more  appropriate  log-likelihood  function  [292]. 

Cremona  and  Brandon  [63]  remark  that  statistical  tests  A/C  crite¬ 
rion,  and  FPE  criterion  tests  are  restricted  to  recursive  minimum  prediction 
error  methods.  Independent  of  their  good  estimation,  they  are  statistically 
based:  they  are  partially  subjective  techniques  because  they  use  the  asymp¬ 
totic  property  of  the  estimates  on  which  to  base  the  model  order  estimation 
strategy. 

In  this  chapter,  an  abstract  setting  via  the  Lebesgue-Radon-Nikodym  deriva¬ 
tive  was  provided  to  illustrate  that,  collectively,  order  determination  and  esti¬ 
mation  are  pieces  of  the  same  task  of  locating  or  discovering  the  distribution 
that  best  describes  the  sampled  data.  Most  of  the  examples  of  methods  for 
order  estimation  are  variations  of  information-theoretic  approaches.  There  are 
also  approaches  from  the  points  of  view  of  coding  theory,  maximum  likelihood, 
maximum  entropy,  and  classical  regression  methods  of  statistics,  included  in 
this  review  are  reviews  of  comparisons  among  techniques. 
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The  approach  of  this  thesis  is  a  classical  Neyman-Pearson  hypothesis  test¬ 
ing  approach.  It  requires  knowledge  of  the  density  functions  of  distributions 
of  interest  and  specification  of  the  acceptable  chance  of  error  of  a  test.  The 
mathematics  for  the  small  sample  complex  principal  components  analysis,  the 
simplest  of  the  multivariate  cases  relying  on  sampling  from  a  complex  vec¬ 
tor  normal  distribution,  has  not  previously  been  worked  out.  Many  necessary 
pieces  have.  The  next  chapter  reviews  what  I  have  learned  about  the  existing 
background  material. 


Chapter  5 
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PREVIOUS  WORK 

The  purpose  of  this  chapter  is  to  present  material  I  have  found  which  provides 
the  necessary  foundations  for  the  development  of  the  small  sample  complex 
principal  components  analysis  approach  for  order  determination.  Three  main 
areas  will  be  reviewed:  array  processing,  statistics,  and  mathematics. 

Traditional  lines  of  demarcation  between  disciplines  become  very  inappro¬ 
priate  when  studying  the  order  identification  problem  in  array  processing. 
Motivated  by  the  acoustic  signal  processing  goals,  the  appropriate  locus  of 
solutions  lie  beyond  the  traditional  mathematical  training  of  engineers  and 
statisticians,  and  is  in  research  areas  by  specialists  in  mathematics  and  statis¬ 
tics.  The  history  of  development  of  the  necessary  mathematics  reveals  that 
much  of  the  important  mathematical  theory  has  been  developed  by  applica¬ 
tion  scientists.  What  is  considered  pure  or  abstract  mathematics  by  most 
engineers  and  statisticians  truly  forms  the  working  set  of  knowledge  necessary 
for  making  headway  in  the  solution  of  practical  array  processing  problems. 

With  this  in  mind,  I  have  rather  artificially  clustered  historical  work  as 
follows.  Under  the  title  of  “array  processing”  I  have  collected  works  drawn 
primarily  from  the  acoustics,  ocean  engineering,  and  electrical  engineering  lit¬ 
erature.  These  works  deal  primarily  with  exploration  of  different  principles 
and  algorithms.  Material  collected  under  the  heading  of  “statistics”  is  further 
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partitioned  into  “eigenvalue  distribution  and  testing”  and  “complex  statis¬ 
tics  other  than  eigenvalue  testing”.  The  set  called  “eigenvalue  distribution 
and  testing”  discusses  work  done  with  respect  to  eigenvalues  of  both  real  and 
complex  Wishart  matrices.  It  focuses  on  the  form  of  the  test  statistics,  the  dis¬ 
tributions  of  the  eigenvalues,  and  the  distributions  of  the  test  statistics.  The 
material  collected  under  “complex  statistics  other  than  eigenvalue  testing” 
refers  to  the  body  of  literature  that  forms  the  supporting  background  theory 
for  eventually  developing  the  necessary  tests  and  test  statistic  distributions. 
Under  the  final  grouping  of  “mathematics”  I  have  included  material  related  to 
the  development  of  zonal  polynomials  and  hypergeometric  functions  of  matrix 
argument  which  is  presented  independent  of  the  context  of  statistics.  This  is 
necessarily  set  in  the  context  of  group  representation  theory  which  provides 
the  foundation  for  these  functions. 

Not  mentioned,  yet  present  in  the  background,  is  the  vast  body  of  knowl¬ 
edge  collected  under  the  subject  of  Lie  theory.  There  is  some  artificiality  here 
because  much  of  the  ancestral  work  is  by  physicists  and  statisticians  seeking 
answers  to  the  eigenvalue  testing  problem.  There  is  a  lot  of  interplay  between 
these  groupings.  With  just  a  little  exposure  to  the  literature,  one  can  see  that 
the  overlap  is  tremendous. 
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5.1  Array  Processing 

A  well  written  brief  tutorial  review  of  beamforming  methods  was  presented 
by  Johnson  [125]  as  an  invited  paper  for  the  Proceedings  of  the  IEEE,  as 
pointed  out  in  the  introduction.  One  of  the  methods  he  discussed  was  that  of 
the  Maximum  Likelihood  Method  (MLM).  He  references  three  articles  on  the 
subject  [76][51][52].  The  approach  is  to  find  the  steering  vector  a  which  yields 
the  minimum  beam  energy  a^Ra  subject  to  the  constraint  that  a^b  =  1,  where 
b  represents  an  ideal  plane  wave  corresponding  to  the  desired  direction-of-look 
and  R  represents  the  spatial  correlation  matrix.  The  solution  is  a  =  • 

Another  approach  is  the  eigenspace  approach.  The  idea  of  an  eigenspace 
approach  to  signal  processing  is  not  new.  In  particular,  the  principal  compo¬ 
nent  analysis  approach  is  now  considered  classic.  Priestley  et  al.  [210]  dis¬ 
cussed  the  application  of  principal  component  analysis  and  factor  analysis  to 
multivariate  systems  for  the  purpose  of  dimensionality  reduction.  They  chose 
as  their  goal  to  obtain  the  best  r-dimensional  representation  of  the  system 
output  vector  Y{t).  Their  method  is  as  follows.  Apply  the  discrete  Fourier 
transform  to  VH),  obtaining  T(a;),  and  then  obtain  eigenvalue  decomposi¬ 
tions  of  the  resulting  frequency- dependent  covariance  matrices.  Process  Y (a;) 
with  the  eigenvectors  corresponding  to  the  r  largest  eigenvalues,  obtaining  r- 
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dimensional  frequency  domain  principal  components  K(a;).  The  r-dimensional 
time  domain  output  Yr{t)  is  obtained  by  an  inverse  discrete  Fourier  transform. 
The  authors  point  out  that  there  may  be  an  aliasing  problem,  as  discussed 
by  Haggan  and  Priestly  [100]  who  successfully  applied  the  method  to  a  real 
system.  The  issue  of  order  estimation  was  not  discussed. 

Schmidt  discussed  the  MUSIC  (Multiple  Signal  Classification)  theory  in 
March  1986  [238].  Consider  a  sonar  array  with  m  elements.  Let  the  noise  at 
these  elements  be  given  by  the  column  vector  w  where  =  (wi,  ■  •  •  ,Wm). 
Let  d  be  the  number  of  signals  independent  of  the  noise.  Denote 

the  set  of  signals  by  the  column  vector  /  defined  by  =  (/i,  •••  ,/</).  Each 
signal  ft  has  its  own  beamformer  parameter  index  (which  we  usually  associate 
with  direction  of  arrival  0,).  The  transfer  function  of  the  beamformer  on  the 
set  of  signals  is  given  by  the  matrix  A  =  [a{0i),  -  ■  ■  ,a{6d)]  where  each  a(0,) 
describes  the  response  of  the  beamformer  to  a  signal  coming  from  direction  $i. 
It  is  assumed  that  the  beamformer  function  a(9)  is  known  for  all  0.  For  this 
rea.son,  for  a  collection  of  specific  desired  look-directions,  the  matrix  A  defines 
a  set  of  vectors  that  form  a  basis  (in  the  sense  of  linear  algebra)  for  the  space 
in  which  signals  processed  by  those  beams  can  be  described.  Therefore,  we 
can  use  matrix  A  to  form  an  orthonormal  basis  for  the  space  containing  the 
signals.  All  of  the  signals  and  some  of  the  noise  processed  by  the  beamformer 
will  be  contained  in  this  space.  The  orthogonal  complement  of  this  space  will 
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contain  only  noise.  The  beamformer  output  of  signal  plus  noise  is  given  by 
the  column  vector  x  =  Af  +  w.  Let  the  signal  covariance  matrix  P  be  defined 
by  P  =  and  let  the  noise  covariance  matrix  X^So  be  defined  by 

=  S^ww^^.  (If  you  compare  this  to  Schmidt’s  paper,  you  will  notice 
I  use  A  as  a  singular  value  throughout,  and  thus  A^  is  the  eigenvalue.)  The 
expected  value  of  the  covariance  matrix  of  the  beamformer  output  is  given  by 
S  =  APA^  +  X^Sq.  The  eigenvalues  of  S  and  of  [S  —  A^5o)  differ  by  A^;„. 
The  multiplicity  of  A^^  in  matrix  S  or  the  multiplicity  of  the  zero  eigenvalue 
in  (5  —  A^5o)  tells  us  the  dimension  of  the  space  containing  only  noise,  and 
therefore  we  also  know  the  dimension  of  the  space  containing  the  signals.  The 
problem  is  stated  in  terms  of  examining  the  roots  of  the  characteristic  equation 

det  {aPA^)  =  det  [S  -  A^^^o)  =  0 

The  paired  sets  of  eigenvalues  and  eigenvectors  (A^,?,)^.^  are  called  eigenso- 
lutions  of  S  with  respect  to  Sq.  Another  terminology  used  is  that  these  are 
eigensolutions  of  S  in  the  metric  of  Sq.  Schmidt  notes  that  the  eigensolutions 
satisfy  the  relationships  Sqi  =  XfSoqi  and  APA^qi  =  (A?  —  A^)5'o9t-  So,  the 
goal  is  to  construct  a  test  to  see  how  many  of  the  smallest  {A?}^  are  equal. 
Various  authors  have  chosen  different  approaches  to  identifying  this  multiplic¬ 
ity,  including  the  selection  of  the  estimator  of  APA^  upon  which  to  base  tests. 
Note  that  the  maximum  dimension  of  APA^  is  min(d,  m).  The  dimension  can 
be  reduced  by  singularity  of  P. 
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Kaveh  and  Barabell  [130]  credit  Kumaresan  and  Tufts  [156]  with  the  Mini¬ 
mum  Norm  method.  Kumaresan  and  Tufts  considered  a  line  array.  Kumaresan 
and  Tufts’  description  is  repeated  here.  The  author’s  variables  are  renamed 
to  make  comparison  with  the  MUSIC  algorithm  easier.  Let  the  number  of 
elements  of  this  array  be  m.  Assume  a  known  number  of  sources,  which  we 
will  call  d.  Let  the  m  x  m  signal- plus- noise  covariance  matrix  for  the  beam- 
former  output  S  be  estimated  by  R.  Let  R  have  the  eigenvalue  decomposition 
R  =  PL^P^  corresponding  to  the  eigenvalue  decomposition  of  S  given  by 
S  =  QA^Q^.  Let  a{dk)  =  a,k  be  the  direction  vector  associated  with  source 
number  k  having  direction-of-arrival  at  the  array  at  an  angle  related  to  6k.  The 
problem  is  to  estimate  a*.  If  a  vector  6  =  [6i,  •  •  • ,  6,„]  has  the  property  that 
aj/b  =  0  for  each  source  k,  then  a  polynomial  D(z)  =  YlT=i  has  roots 

at  values  of  z  corresponding  to  the  The  m  —  d  -|-  1  eigenvectors 
of  S  corresponding  to  the  noise  eigenvalues  have  this  property.  This 

is  approximately  true  for  the  sample  eigenvectors  {pfc}2Yi  computed  from  R 
corresponding  to  the  noise  subspace. 

The  goal  is  to  find  b  spanning  the  whole  noise  subspace  of  R.  The  source  of 
the  name  “Minimum- Norm”  comes  from  the  following  criterion.  Its  Euclidean 
length  (its  norm)  is  required  to  be  minimized.  To  make  the  solution  unique, 
the  first  element  is  constrained  to  be  unity. 
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Partition  the  sample  eigenvectors  P  =  [P5,  P^v]  into  the  set 

9^ 

Ps  =  ,Pd)  = 

P[S\ 

corresponding  to  the  signal  subspace,  and 

T 

Pn  =  {Pd+l,-‘-  ,Pm)  = 

,  P[N] 

corresponding  to  the  noise  subspace.  The  vectors  and  are  the  top  rows  of 

their  respective  matrices.  The  matrices  P[s]  and  P[jv]  are  merely  the  remainder 

of  their  respective  partitions.  The  solution  is  given  by 

I  1 

b  = 

^  c"c 

l-gHg  J  L 

where  the  top  element  of  b  is  unity. 

Their  theory  is  developed  ba.sed  on  partitioning  the  eigenstructure  of  the 
underlying  covariance  matrix  of  the  beamformer  output  (not  the  sample  co- 
variance  matrix).  The  discussion  implied  that  the  simulated  sample  covariance 
matrix  was  decomposed.  The  number  of  signals  was  cissumed. 

Kaveh  and  Barabell  [130]  evaluated  the  asymptotic  statistical  performance 
of  the  MUSIC  algorithm  and  the  Minimum  Norm  algorithms  in  April  1986 
against  closely  spaced  narrowband  plane  waves.  The  Minimum  Norm  null- 
spectrum  had  a  smaller  bias  at  a  source  angle  compared  to  the  MUSIC  null- 
spectrum.  In  a  simulation,  a  fixed  resolution  threshold  was  achieved  at  a 


lower  signal- to- noise  ratio  for  the  Minimum  Norm  method  than  by  MUSIC. 
Moghaddamjoo  reported  simulation  results  in  [184].  It  was  shown  that  if 
at  least  one  signal  eigenvalur  is  close  to  the  noise-related  eigenvalues,  then 
the  associated  eigenvector  will  have  significant  errors  which  translate  into  a 
significant  bearing  error.  This  is  to  be  expected  for  a  low  signal-to-noise  ratio 
case. 

Wax,  Shan,  and  Kailath  [279]  discussed  eigenstructure  methods  for  beam- 
forming  for  both  the  narrowband  and  wide  band  cases.  They  specialized  their 
treatment  to  a  line  array  of  equally  spaced  sensors. 

Each  of  the  m  sensors  feeds  a  delay  line  with  p  registers.  Each  independent 
sample  thus  contains  mp  pieces  of  data.  The  number  of  sources  d  is  unknown. 
Under  Gaussian  assumptions,  the  statistic  for  testing 

is  given  by  Anderson’s  likelihood  ratio  given  here  as  equation  5.1  where  the 
are  the  eigenvalues  of  the  sample  covariance  matrix.  Wax  et  al.  dis¬ 
cussed  the  application  of  the  asymptotic  case  for  the  statistic  distribution. 


Scharf  [237]  proposed  looking  at  the  quantity  he  calls  divergence,  which 
is  the  difference  between  the  expected  values  of  the  log  likelihood  ratio  test 
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statistic  under  the  null  and  alternate  hypotheses.  Based  on  this,  he  points  out 
that  it  is  the  sum  +A“^  =  2cosh(2ln  A„)  which  determines  the  contribution 
of  an  eigenvalue  to  divergence,  not  the  value  of  A^.  He  presents  an  algorithm 
that  selects  the  dominant  eigenvalues.  He  also  notes  that  each  eigenvalue 
satisfies  the  generalized  eigen  problem  {Ri  —  X^Ro)x  =  0  where  Rq  is  the 
covariance  matrix  under  the  null  hypothesis  and  Ri  is  the  covariance  under 
the  alternative.  This  is  not  restricted  to  the  case  of  =  Ro-\-  R^  where  R^  is 
the  signal  covariance;  however,  that  is  the  usual  assignment. 

Friedlander  presented  an  eigenspace  approach  to  interference  cancellation 
in  his  nicely  written  December  1988  paper  [88].  The  key  to  his  approach 
is  constructing  a  weighting  vector  W  such  that  W  lies  in  the  signal  sub¬ 
space  and  is  orthogonal  to  the  interference  component  of  the  array  manifold. 
The  array  manifold  0(7)  is  defined  to  be  that  portion  of  the  factorization  of 
the  array  response  function  which  is  due  to  the  geometry  of  the  array  (func¬ 
tion  of  the  time  delay  from  each  array  element  to  a  reference  point)  and  the 
steering  direction  7.  The  signal  subspace  is  defined  to  be  the  set  of  vec¬ 
tors  in  the  array  manifold  associated  with  each  signal  and  interference  source, 
[a(7«flna/),a(7,o«rce2),-  • ' ,  a{lsovLrce.p)]-  This  Same  subspace  is  also  spanned  by 
the  eigenvectors  associated  with  the  largest  eigenvalues  of  the  covariance  ma¬ 
trix  of  the  received  signals. 

The  innovation  of  this  paper  is  finding  a  way  around  having  to  know  0(7) 
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of  each  interfering  source.  To  overcome  this,  he  considers  a  cost  function 
based  on  the  spectral  characteristics  of  the  desired  signal.  The  method  will 
not  work  directly  where  there  is  a  coherent  multipath  present,  but  a  possible 
modification  is  proposed.  Although  his  paper  is  written  in  terms  of  eigenvalue 
decompositions,  he  makes  it  clear  that  use  of  the  singular  value  decomposition 
is  a  related  idea. 

Fuchs  [89]  also  discusses  an  eigenspace  approach  in  that  same  journal  is¬ 
sue.  He  bases  his  approach  on  a  matrix  perturbation  analysis.  Lee  and  Wen- 
grovitz  [163]  studied  the  ability  of  MUSIC  to  separate  closely  spaced  sources 
when  a  beamforming  preprocessor  is  used.  It  was  shown  that  this  technique, 
called  Beamspace  MUSIC,  performed  better  than  the  Minimum  Norm  tech¬ 
nique.  The  key  is  to  reduce  the  noise  subspace.  They  also  suggested  that  a 
beamforming  preprocessor  improves  the  performance  of  the  Minimum  Norm 
algorithm. 


5.2  Statistics 

5.2.1  Eigenvalue  Distributions  and  Testing 

In  statistics,  the  problem  being  considered  is  known  as  the  part  of  the  Ex¬ 
act  Principal  Components  Analysis  (PCA)  problem  for  the  complex  variables 
case.  The  inner  product  of  a  data  vector  with  the  eigenvector  of  the  sample 


covariance  matrix  is  the  fc**  Principal  Component.  The  sample  variance  of  the 

principal  component  is  the  corresponding  sample  eigenvalue  /^.  Eigenval¬ 
ues  go  by  several  names  in  the  literature.  They  are  also  known  as  characteristic 
roots  and  as  latent  roots.  A  very  important  fact  [197]  is  that  the  eigenvalues 
of  a  sample  covariance  matrix  are  all  different,  with  probability  1.  A  won¬ 
derful  introduction  to  Principal  Component  Analysis  is  given  in  Chapter  8  of 
reference  [186].  Sections  of  special  interest  are  8.3  (geometrical  meaning  of 
principal  components)  and  8.7  (sampling  properties  of  principal  components). 

In  geophysics  and  meteorology,  principal  components  are  known  as  Em¬ 
pirical  Orthogonal  Functions.  A  solution  to  the  problem  consists  of  several 
parts.  The  first  and  easiest  part  is  the  specification  of  test  statistics.  The 
next  part  is  obtaining  the  distribution  of  the  test  statistics.  Closed  form  so¬ 
lutions  are  desired  but  not  always  obtainable.  Sometimes  they  are  obtainable 
with  great  effort  or  clever  tricks.  Often  the  density  of  a  desired  distribution  is 
the  marginal  density  of  some  obtainable  joint  distribution  that  is  difficult  to 
integrate. 

Von  Storch  and  Hannoschock  [261]  discussed  estimating  principal  compo¬ 
nents  in  the  small  sample  case  in  the  context  of  meteorology.  Their  conclusions 
are  important  enough  to  repeat. 

1.  The  sample  eigenvalue  P  is  a  considerably  biased  estimator  of  the  true 
eigenvalue  A^.  The  bias  is  positive  for  the  largest  A^;  the  bias  is  negative  for 
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the  smallest  eigenvalue.  It  is  of  the  order  of  1/m  where  m  is  the  number  of 
independent  samples.  The  variances  of  is  the  order  of  1/m  too. 

2.  By  means  of  correction  methods,  unbiased  eigenvalue  estimators  are 
constructed.  However,  the  decrease  of  the  bias  is  accompanied  by  an  increase 
of  the  estimator’s  variance.  For  the  largest  eigenvalue,  at  least,  the  Jackknife 
yields  favorable  results. 

3.  The  following  comments  are  in  the  context  of  estimated  second  moments 
of  generalized  Fourier  coefficients  of  a  fixed  set  of  principal  components.  On 
average,  for  small  i  (large  i)  the  sample  eigenvalue  If  will  overestimate  (un¬ 
derestimate)  the  variance  expressed  by  the  corresponding  principal  component 
considerably.  The  covariances  are  generally  not  negligible.  This  means  that 
the  independence  of  parameter  covariance  matrix  eigenvector  coefficients  can¬ 
not  be  transferred  to  principal  component  coefficients  derived  from  the  sample 
covariance  matrix. 

Kshirsagar  [154]  (p.  58)  gives  a  fascinating  review  of  the  history  of  the 
derivation  of  the  real  Wishart  distribution.  He  says  the  case  of  p  =  2  was  first 
derived  by  Fisher  in  1915  [83],  and  that  Wishart  did  it  for  p  =  n\n  1928  [290]. 
It  was  in  1935  (almost  yesterday,  when  my  father  was  17)  that  Fisher  published 
his  paper  [84]  on  the  density  and  cumulative  distribution  functions  for  the  uni¬ 
variate  and  t  distribution.  In  1937,  Hoel  [108]  derived  approximations  for 
the  distributions  for  the  generalized  variance  (the  determinant  of  the  covari- 
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ance  matrix),  one  for  the  case  when  samples  are  not  too  small,  and  the  other 
for  large  samples.  Two  early  papers  on  principal  components  are  by  Hotelling 
[113]  in  1933  and  by  Girshick  [90]  in  1936,  both  of  which  are  referenced  in  one 
of  the  early  works  in  the  distribution  of  sample  eigenvalues  by  Girshick  [91]  in 
1939.  In  this  1939  paper,  he  derives  the  asymptotic  distribution  for  the  sample 
eigenvalues  of  a  real  Wishart  matrix,  as  well  as  other  quantities.  Let  be 
the  population  covariance  between  random  variables  a:,  and  xj  in  multivariate 
normal  random  vector  =  (xi,  •  •  •  ,Xp).  The  fundamental  equation  derived 
by  Girshick  in  this  paper  is  his  equation  (3.11), 

6{d(Tijd(Tkm}  =  -{cTikCTjm  +  (^im<^jk) 

n 

From  this  equation,  he  produces  his  other  results.  Specifically,  the  variance  of 
the  sample  eigenvalue  ll  is  given  by  var(/|)  =  where  n  is  the  number  of 
samples  from  which  the  estimate  is  derived.  (When  you  compare  the  formulae 
written  here,  remember  that  in  this  paper,  the  {lk}i  are  estimates  of  the 
singular  values  {Ajt}i.)  The  set  of  quantities 

Ul-H  iI-hY 
\  >‘1^/1  J. 

is  distributed  asymptotically  Np{0,  Ip).  By  a  clever  insight,  Girshick  considers 
the  quantity  log/^  as  a  way  to  eliminate  the  population  eigenvalue  XI .  By 
applying  a  Taylor  series  expansion  and  ignoring  higher  order  terms,  he  finds 
the  asymptotic  variance  of  log(/^),  which  is  given  by  var(log/^)  =  As 
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an  aside,  Girshick  uses  the  following  convention  we  have  come  to  associate 
with  Einstein.  A  repeated  subscript  in  the  same  term  stands  for  summation. 
If  repeated  subscripts  appearing  in  a  term  are  not  to  be  summed,  they  are 
placed  in  brackets  following  the  expression  in  which  they  appear. 

Lawley  [161]  studied  tests  involving  the  latent  roots  of  sample  covariance 
and  correlation  matrices  in  1956.  His  interest  was  in  those  cases  where  the 
effects  of  the  k  largest  latent  roots  have  been  removed,  and  he  tested  the 
hypothesis  that  the  re  ning  roots  are  equal.  The  Principal  Components 
Analysis  problem  for  the  raw  covariance  matrix  was  solved  by  T.  W.  Ander¬ 
son  in  his  1963  paper  [24]  which  has  become  a  classic  paper  in  the  statistics 
literature.  He  gives  a  test  of  significance  on  eigenvalues  for  the  large  sample 
case  where  the  data  is  sampled  from  the  real  multivariate  normal  distribu¬ 
tion.  In  the  immediately  following  article,  Lawley  [162]  extended  Anderson’s 
result  to  test  a  set  of  correlation  coefficients  for  equality.  It  was  solved  for  the 
large  sample  complex  multivariate  normal  distribution  case  by  R.  P.  Gupta 
[98]  in  1965  who  purposefully  paralleled  Anderson’s  derivations  and  used  the 
same  notation  as  closely  as  possible.  Work  on  the  asymptotic  cases  has  been 
continued  by  Tyler  [269][270]. 

The  solution  for  the  Exact  PCA  problem  was  considered  intractable  for  a 
long  time,  as  it  is  often  true  that  small  sample  cases  are  much  more  difficult 
than  the  corresponding  asymptotic  cases.  That  is  why  the  asymptotic  cases 
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are  studied.  This  statement  provides  an  opportunity  to  establish  an  important 
and  easy  to  miss  point.  The  label  “asymptotic”  is  ambiguous  because  it  is 
used  in  two  different  ways  in  the  technical  literature.  The  commonly  assumed 
meaning  is  the  large  sample  case,  obtained  by  letting  the  number  of  samples 
tend  to  infinity.  The  second  meaning  is  related  to  the  number  of  terms  carried 
in  the  expansion  of  an  exact  or  approximation  expression.  In  this  second  case, 
a  small  number  of  terms  may  yield  a  more  accurate  approximation  than  a 
large  number  of  terms.  This  point  is  nicely  discussed  in  Keener’s  text  [131] 
(p.  425)  as  follows.  He  defines  fniz)  to  be  asymptotic  to  f{z)  if 

fim  I  z^ifniz)  -  f{z))  1=  0 

for  a  fixed  value  of  n.  This  concept  has  nothing  at  all  to  do  with  convergence 
since  finding  a  good  approximation  does  not  require  taking  more  terms.  A 
series  can  be  asymptotic  even  though  it  may  be  divergent.  In  fact,  asymptotic 
series  are  often  divergent,  so  taking  more  terms  is  not  simply  more  work,  it  is 
actually  damaging. 

Progress  in  attacking  the  small  sample  cases  was  motivated  by  two  seminal 
works  by  James  [118][120].  In  1960,  he  found  the  sampling  distributions  of  the 
eigenvalues  of  the  covariance  matrix  from  a  sample  of  the  real  multivariate 
normal  distribution.  He  relied  on  representation  theory  of  the  linear  group. 
In  1964,  James  extended  his  results  to  include  distributions  derived  from  the 
complex  normal  distribution,  as  well  as  other  forms  that  are  related  to  the 
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multivariate  normal  distribution.  He  developed  his  results  through  the  use  of 
zonal  polynomials  of  matrix  argument,  and  expressed  his  results  in  terms  of 
hypergeometric  functions  of  one  and  two  matrix  arguments.  He  did  his  work 
for  the  case  of  real  variables.  Based  on  similarity  of  forms,  he  summarily  wrote 
down  the  results  for  the  complex  case  without  proof.  In  1966,  James  [121] 
applied  his  work  to  principal  components  in  the  case  of  a  sample  covariance 
matrix  of  real  variables.  An  interesting  observation  he  made  concerns  the  effect 
of  extreme  roots  on  the  likelihood  ratio  of  other  adjacent  roots.  Suppose  that 
the  ratios  of  the  root  Ij  to  the  adjacent  roots  If,  are  both  much  less  than 
1  or  both  much  greater  than  1.  Then  the  root  influences  the  likelihood  of 
the  other  two  by  a  factor 

(('?  - '?)('?+,  - 

Muirhead  [187]  elaborated  on  James'  work,  collecting  niany  of  the  ideas 
into  the  setting  of  studying  distribution  theory  for  real  multivariate  analy¬ 
sis.  Muirhead’s  book  is  the  natural  descendent  of  Anderson’s  classic  text. 
Muirhead  produced  the  first  comprehensive  text  on  multivariate  distribution 
theory  incorporating  zonal  polynomials,  hypergeometric  functions  of  matrix 
argument,  and  application  of  exterior  products.  The  importance  of  this  devel¬ 
opment  is  its  application  to  the  derivation  of  noncentral  distributions  needed 
to  evaluate  the  power  of  test  statistics. 

Krishnaiah  was  very  active  in  developing  exact  and  asymptotic  distribu- 
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tions  of  eigenvalues  and  their  tests  based  on  both  real  and  complex  Wishart 
matrices,  often  expressing  results  in  terms  of  zonal  polynomials  or  using  zonal 
polynomials  in  his  proofs.  Much  of  this  work  was  done  through  the  Aerospace 
Research  Laboratories  of  the  United  States  Air  Force  at  Wright-Patterson  Air 
Force  Base.  In  1969,  Krishnaiah  and  Waikar  [144]  reported  on  tests  of  eigen¬ 
values  from  a  real  Wishart  matrix  based  on  Roy’s  union-intersection  principle. 
Effectively,  the  null  hypothesis  is 

=  - XI 

Five  different  alternative  hypotheses  were  derived. 

All 

A2:  (A;  >  AJ)  U  (A|  >  AJ)  U  •  ■  •  U  (A2_,  >  AJ) 

A3-.  (A;>A|)U(A;>Ai)U---U(Af>A2) 

A,  :  (Xj  ^  A’)  U  (A|  A’)  U  ■  ■  •  U  (AJ_,  ^  A’) 

As:  (Ai>Ai)U(A;>Ai)U-  -U(A;>  A=)U(Ai>  Ai)U---U(AJ_,  >AJ) 
The  joint  densities  for  these  tests  were  provided  for  the  case  of  the  real 
Wishart  distribution,  expressed  in  terms  of  the  hypergeometric  function  of  two 
matrix  variables  and  in  terms  of  normalized  zonal  polynomials.  One  of  these  is 
generalized  in  Muirhead’s  derivation  [187]  of  his  Theorem  3.2.20.  The  different 
caaes  are  for  different  test  statistics  and  alternative  hypotheses.  Krishnaiah 
and  Waikar  also  work  out  the  asymptotic  cases  (large  sample  size).  Three 
months  later,  Krishnaiah  and  Waikar  [145]  further  developed  the  test  against 
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alternative  A5  by  finding  the  density  function  for  the  test  statistic  4 . 

In  1970,  Krishnaiah  and  Chang  [146]  reported  on  the  exact  distribution  of 

the  smallest  root  of  the  real  Wishart  distribution  Wp{n,  nip)  where  they  require 

the  number  {n  —  p—  l)/2  to  be  an  integer.  They  accomplish  this  by  changing 

variables  from  the  sample  eigenvalues  to  (^1, . . .  0p)  where 

•2 

gi  =  and  Op  =  Ih  and  then  integrating  out  the  .  The  result  is  expressed  in 

*P  ^ 

terms  of  zonal  polynomials.  Three  months  later,  Krishnaiah  and  Waikar  [147] 
reported  on  the  cumulative  distribution  function  of  the  intermediate  eigenvalue 
of  the  real  Wishart  matrix  distributed  as  Wp(n,  Ip).  The  results  are  reported 
in  an  integral  form.  They  assume  If  is  known,  and  they  look  at 

P  <  x}  =  P  <  x}  -  P  {/J  <  •  •  •  <  <  X  <  <  •  •  •  <  /J} 

Lemma  2.1  of  [147]  is  referenced  in  later  reports.  In  a  separate  simultaneous 
report  [148],  they  show  how  to  evaluate 

P{Xi<ll<ll<X2} 


for  the  real  variables  case.  This  is  expressed  as  the  sum  of  four  probabilities 
that  are  characterized  by  the  details  of  the  end  points  of  evaluation.  The 
message  is  to  consider  the  different  combinations  suggested  by  the  set  of  in¬ 
equalities  given  in  equation  5.2. 


‘r-l 


‘»+l 


(5.2) 
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Two  months  later,  Waikar,  Chang,  and  Krishnaiah  [272]  extended  the  work 
to  find  the  joint  density  function  of  any  few  unordered  roots  of  a  noncentral 
complex  Wishart  matrix.  Without  loss  of  generality,  they  consider  the  first  r 
roots.  The  case  for  the  central  distribution  was  worked  out  by  Wigner  [286]. 
Waikar  et  al.  used  assumptions  on  the  structure  of  matrix  A  that  are  different 
than  those  based  on  Goodman’s  work  in  relating  complex  and  real  Gaussian 
distributions.  Thus,  some  care  is  needed  in  using  results  by  one  author  in  the 
results  of  another  author. 

In  the  Fall  of  1971,  Krishnaiah  and  Waikar  [149]  reported  on  the  distribu¬ 
tion  of  arbitrary  consecutive  ordered  roots  of  the  real  Wishart  matrix.  This 
work  includes  the  marginal  density  function  and  the  cumulative  distribution 
functions.  Results  are  reported  in  integral  form.  In  1972,  Davis  [65]  reported 
on  the  ratios  of  individual  eigenvalues  to  the  trace  of  a  Wishart  matrix.  See 
also  the  work  by  Khatri  [139]  on  the  exact  finite  series  distribution  of  the 
smallest  or  the  largest  eigenvalue.  In  that  same  year,  Waikar,  Chang  and  Kr¬ 
ishnaiah  [273]  derived  expressions  for  the  joint  densities  of  any  few  unordered 
roots  of  the  noncentral  complex  Wishart  matrix  (as  well  as  for  three  other 
matrices). 

In  1973,  Krishnaiah  [150]  continued  the  study  of  eigenvalues  of  complex 
random  matrices  by  deriving  the  exact  distributions  of  some  test  statistics 
based  on  eigenvalues  of  the  matrix  Z  =  A{A  -t-  where  A  ~  CVFp(n,Si) 
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and  B  ~  CW^p(m,  E2).  Krishnaiah  computed  the  joint  density  function  of 
(72 )  72  5  ‘  leaving  the  result  as  an  integral  and  a  product  of  sums  of 

‘p  ‘p  ‘p 

normalized  zonal  polynomials.  He  likewise  computed  the  joint  density  function 
where  the  smallest  sample  eigenvalue  is  replaced  by  the  sum  of  the  sample 
eigenvalues  in  the  denominators. 

In  1974,  Krishnaiah  and  Shuurmann  [151]  derived  expressions  for  the  distri¬ 
butions  of  the  ratios  of  the  intermediate  roots  to  the  trace  of  the  real  Wishart 
matrix,  and  the  intermediate  roots  of  the  real  Wishart  matrix.  They  obtained 
a  relationship  between  the  Laplace  transformations  of  the  ratios  of  the  indi¬ 
vidual  roots  to  the  trace  of  the  complex  Wishart  matrix  ,  1^)  and 

the  distributions  of  the  individual  roots  of  this  matrix.  Using  this  relation¬ 
ship  and  expressions  for  the  densities  of  the  individual  roots  of  the  complex 
Wishart  matrix,  they  obtained  expressions  for  the  distributions  of  the  ratios 
of  the  individual  roots  to  the  trace  of  that  matrix. 

In  1976,  Krishnaiah  [152](pp.  26-27)  proposed  two  more  tests  of  interest 
when  you  know  in  advance  that  X]  ^  Xj.  The  first  test  is  for  Hij  :  X]  <  dXj 
for  d  >  1  against  the  alternative  Aij  :  XJ  >  Xj  where  i  >  j.  Hypothesis  Hij  is 
not  rejected  if  lildlj  <  Cq  where 

Pr{/p//?  <  dcc  \Xl<dX]]  =  {\  -  a) 


The  second  test  is  for  Hij  :  XJ  —  Xj  <  d  (or  d  >  0  against  the  alternative 
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Aij  :  A?  —  Aj  >  d.  Hypothesis  is  not  rejected  if  (/?  —  P-  —  d)  <  Ca  where 
Pr  {/^  -  /J  <  d  +  c,  I  A^  -  A^  <  d}  =  (1  -  a) 

The  untimely  and  premature  death  due  to  cancer  of  the  great  statistician 
Paruchuri  R.  Krishnaiah  on  01  August  1987  in  Pittsburgh,  Pennsylvania  in¬ 
terrupted  brilliant  progress  on  these  difficult  problems. 

In  1984,  Jolicoeur  [126]  proposed  a  test  about  the  direction  of  multivariate 
normal  principal  axes  for  the  small  sample  case.  Let  5  be  a  real- valued  sample 
covariance  matrix  with  normalized  eigenvectors  P  having  row  vectors  7,  as  the 
direction  cosines  of  the  principal  axes.  Then  the  statistic 

biS-tJus-'tf  - 1) 

is  distributed  according  to  the  F  distribution  with  p  —  1  and  iV  —  p  degrees  of 
freedom, 

Konstantinides  and  Yao  [142]  reviewed  criteria  used  to  test  the  effective 
rank  t  <  n  of  an  observed  real  matrix  X  by  using  the  singular  values.  They 
critiqued  the  following  test  criteria  and  performed  a  perturbation  analysis  on 
the  real  matrix  model  X  ■=  A-\-  E. 


l\>  ll>  ■  ■  •  >  1]  >  ^X>  lUx>  ■  •  ■  >  ll 

(5.3) 

A 

A 

(5.4) 

(5.5) 

+  +  +  ,  , 

19  I  19  ■  .  19 
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(5.6) 

(5.7) 


Konstantinides  and  Yao  also  reported  the  following  interesting  theorems. 


Theorem  3  Let  A  be  any  real-valued  m  x  n  matrix  A  =  (ai,---,a„).  Let 
||/i||i?  be  the  Frobenious  norm  of  A  defined  as  the  square  root  of  the  sum  of 
the  squares  of  each  element  of  A.  Let  the  2-norm  ||A||2  =  max  where 

the  2-norm  of  x  is  the  square  root  of  the  sum  of  the  squares  of  the  elements  of 
vector  X.  Then  the  following  inequalities  are  valid:  max  |  a,j  |<  max||aj||2  < 
\\A\\2  <  PIIf  <  Vn\\aj\\2  <  y/^m&x  \  aij  |. 

Theorem  4  Let  A,  B,  and  E  bemxn  real-valued  matrices  with  B  =  A-]-  E. 
Denote  their  respective  singular  values  by  a,,  ,0,,  and  e,  where  1  <  i  <  k  < 
min(m,  n),  each  set  labeled  in  non-increasing  order.  Then  |  —  q,-  |<  Ci  = 
||£^||2  where  I  <  i  <  k. 

Theorem  5  Let  A,  B,  and  E  bemxn  matrices  with  the  2-norm  of  E  denoted 
by  Cl.  If  ar  >  2c,-,  then  >  t\>  fir+i,  and  B  is  said  to  have  effective  rank  of 
r. 


Horel  presented  a  good  practical  review  of  complex  principal  component 
analysis  [111].  One  important  property  of  complex  principal  component  (CPC) 
analysis  in  the  time  domain  mentioned  is  that  since  correlations  between  time 


116 


series  are  heavily  weighted  by  periods  during  which  the  amplitudes  of  the  time 
series  are  large,  more  weight  is  given  to  sharp  transitions  and  noisy  spikes  than 
to  periods  during  which  the  signal  varies  slowly.  The  Hilbert  transform  does 
not  act  as  a  low-pass  filter  upon  the  data.  It  contains  as  much  energy  due  to 
noise  as  the  original  data  and  it  may  redistribute  the  noise  to  different  parts 
of  the  time  series.  To  minimize  this  problem,  the  filter  weights  W{ijj)  can  be 
chosen  so  as  to  apply  a  low-pass  filter  to  both  the  original  data  and  its  Hilbert 
transform  prior  to  further  computations. 

The  phase  of  the  principal  components  is  ambiguous.  This  indeterminacy 
becomes  important  when  the  researcher  wishes  to  compare  complex  princi¬ 
pal  components  obtained  from  independent  data  sets.  In  such  cases,  it  is 
impossible  to  determine  lead-lag  relationships  between  the  independent  com¬ 
plex  principal  components  by  simply  computing  their  cross-correlation  since 
the  phase  of  each  complex  principal  component  is  known  only  to  within  an 
arbitrary  constant. 

In  looking  at  time  series,  the  real  and  imaginary  parts  of  complex  principal 
components  are  not  Hilbert  transforms  of  one  another.  They  do  not  necessarily 
explain  the  same  amount  of  variance  in  each  frequency  band  and  thus  the  real 
part  of  the  complex  principal  components  does  not  contain  all  the  relevant 
information.  Frequency  domain  principal  component  analysis  does  not  suffer 
from  this  problem  because  in  that  approach  the  principal  component  is  a  real 
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time  series. 

Wong,  Zhang,  Reilly,  and  Yip  proposed  new  estimates  for  sample  eigen¬ 
values  in  1990.  They  account  for  the  bias  in  the  estimation  of  eigenvalues 
from  looking  at  the  eigenvalues  of  the  sample  covariance  matrix.  Let  {A^ 
be  the  revised  estimates  of  the  corresponding  population  eigenvalues.  These 
are  computed  in  equation  5.8,  with  the  estimated  variance  given  in  equation 


5.9. 


\2  =  /2  _  Am  V  _  M-fc 

‘m  N  (A2.-A?)  N 

where 


m  =  1,  •  •  • ,  fc 


(5.8) 


1 

M-k 


M 


E 

i=fc+l 


'  N  ^ 


t=l 


(5.9) 


The  good  idea  is  that  these  provide  a  correction  to  the  estimated  eigenval¬ 
ues  that  accounts  for  the  effects  of  other  eigenvalues.  On  the  other  hand,  these 
estimates  no  longer  obey  the  simpler  joint  distribution  which  makes  finding 
the  distribution  of  relevant  test  statistics  more  difficult.  The  desirability  of 
making  these  corrections  depends  on  what  you  want  to  use  the  answer  for. 
The  point  here  is  that  the  {/?}  might  be  biased  estimates  of  the  underlying 
population  eigenvalues,  but  even  more  importantly  they  are  statistics  whose 


distribution  we  know. 
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5.2.2  Complex  Statistics  Other  than  Eigenvalue  Test¬ 
ing 

Development  of  joint  distributions  of  sample  eigenvalues  and  related  test  statis¬ 
tics  requires  a  supporting  body  of  distributional  results.  The  literature  regard¬ 
ing  complex  multivariate  statistics  is  sparse  and  isolated. 

The  study  of  statistics  of  complex  variables  is  still  so  young  that  funda¬ 
mental  results  are  still  in  dispute.  Some  of  the  results  I  need  simply  are  not 
in  the  many  references  I  consulted.  For  this  reason,  I  have  undertaken  a  sys¬ 
tematic  development  of  fundamental  properties  and  distributions  related  to 
complex  multivariate  random  variables.  This  section  reviews  the  literature  I 
have  found  on  the  subject. 

Working  with  complex  variables  in  the  context  of  statistics  dates  at  least 
as  early  as  the  renaissance  of  statistics  in  the  1930s.  Ingham  published  a  paper 
in  1933  [114]  evaluating  the  integral 

/  I  \P(P+l)/2  r 

(s)  expi-;Mcr))[det(M-.r)j-‘(dr) 

where  A  is  positive  definite  real  and  C  and  T  are  real  symmetric.  The  uni¬ 
variate  complex  Gaussian  distribution  was  first  introduced  by  Wooding  [293] 
in  1956.  He  looked  at  the  complex  Fourier  series 

2(0  =  -  i6Oexp[i0j(O] 

j 

where  9j,  Oj,  and  bj  are  real- valued  coefficients.  He  followed  some  of  the 
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work  previously  done  with  pre-envelopes  and  analytic  signals  by  S.  0.  Rice 
[220][221].  This  work  was  extended  by  Dugundji  [70]  in  1958.  Wooding  applied 
the  Hilbert  transform  to  a  real  signal  x(t)  and  formed  a  complex  variable 

z{t)  =  x{t)  -f  ix{t) 

The  Hilbert  transform  is  defined  by 

1  r°°  d(T 

x{t)  =  -P.V.  /  x{t  -  a)— 

TT  J-oo  or 

with  its  inverse  given  by 

1  r°°  dfT 

x(t)  =  -P.V.  x(t  +  (T)~ 

7r  J-oo  a 

The  P.V.  before  the  integral  sign  signifies  that  the  Cauchy  Principal  Value  is 
used  in  doing  the  evaluation.  Some  references  use  a  bar  through  the  integral 
instead  of  writing  P.V..  As  acknowledged  by  Wooding,  the  notion  of  a  stochas¬ 
tic  process  as  being  complex  Gaussian  was  not  new.  Root  and  Pitcher  [227] 
mention  it  in  their  paper  in  1955. 

The  next  major  contributor  to  the  theory  of  the  multivariate  complex  nor¬ 
mal  distribution  and  related  statistics  was  N.  R.  Goodman  [92] [93].  Goodman 
demonstrated  the  relationships  between  complex  and  real  vector  variables.  He 
gave  explicit  expression  to  how  various  properties  are  related.  He  showed  that 

multiplication  of  complex  scalars  of  the  form  z  =  x  iy  is  the  same  as  mul- 

/  \ 

X  -y 


tiplication  of  matrices  of  the  form 


y  x 


.  If  you  replace  each  element 
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in  an  n  X  n  complex  matrix  with  the  corresponding  2x2  matrix,  you  then 
have  a  real- valued  2n  x  2n  matrix  that  acts  the  same  under  multiplication  and 
addition  as  the  complex  matrix,  except  that  much  more  computational  effort 
is  required.  He  proved  other  algebraic  results  as  well.  In  statistics,  he  stated 
the  density  function  of  the  zero  mean  vector  complex  normal  distribution  with 
covariance  matrix  S  and  derived  its  characteristic  function.  He  derived  the 
density  function  for  the  central  complex  Wishart  distribution  and  the  char¬ 
acteristic  function  of  a  distribution  related  to  the  central  complex  Wishart 
distribution.  He  also  derived  the  density  function  of  the  Hermitian  square 
root  upper  triangular  matrix  of  a  Wishart  matrix,  where  W  =  T^T.  In  the 
companion  paper,  Goodman  derived  the  distribution  of  det(W}. 

In  reference  [135^,  Khatri  cited  Wishart ’s  1948  effort  in  Biometrika  to  cat¬ 
alog  many  different  methods  of  deriving  the  real  Wishart  distribution.  Khatri 
said  these  methods  use  different  kinds  of  tools  like  transformations,  direct  in¬ 
tegration,  characteristic  function,  and  inversion  theorem,  geometrical  method, 
induction,  rectangular  coordinates,  random  orthogonal  transformations,  or¬ 
thogonal  groups,  etc.  He  pointed  to  Kshirsagar’s  Bartlett  decomposition  of  a 
Wishart  matrix,  and  also  produced  his  own  derivation  which  involved  a  parti¬ 
tioning  scheme.  In  1963,  Khatri  published  a  paper  [136]  discussing  conditions 
under  which  a  second  degree  polynomial  in  elements  of  a  real  matrix  normal 
variable  would  be  Wishart.  He  also  discussed  issues  of  independence  of  real 
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vector  normal  variable  sample  mean  and  sample  covariance  matrix.  As  an 
epilogue,  he  remarked  that  the  results  also  hold  for  the  complex  case  with  the 
appropriate  changes.  In  1965,  Srivastava  [256]  published  his  important  paper 
on  the  complex  Wishart  distribution,  which  included  a  powerful  generaliza¬ 
tion  for  finding  the  density  function  of  any  random  variable  A  =  BB^  when 
the  density  of  the  random  variable  B  depends  on  B  only  through  the  form 
BB^ .  Shortly  thereafter,  in  the  same  year,  Khatri  [137]  published  a  com¬ 
prehensive  review  of  cla.ssical  statistical  analysis  based  on  the  vector  complex 
normal  distribution.  A  paper  published  by  Tan  in  1968  in  the  Tamkang  Jour¬ 
nal  of  Mathematics  [266]  gives  an  extensive  development  of  distribution  theory 
related  to  the  complex  normal  distribution.  It  has  been  relatively  unnoticed 
because  it  was  published  in  Taipei.  It  deserves  much  wider  recognition.  Krish- 
naiah  [152]  updated  this  review  in  his  comprehensive  paper  of  1976.  Srivastava 
and  Khatri’s  book  [257]  on  multivariate  statistics  in  1979  treats  the  complex 
case  where  it  can  do  so  profitably  without  destroying  the  flow  of  the  material. 
They  include  complex  matrices  in  their  theorems  on  matrix  theory. 

There  are  two  texts  devoted  to  the  statistics  of  complex  variables,  both 
by  Kenneth  S.  Miller.  Miller’s  interest  in  statistics  of  complex  variables  is  a 
natural  extension  of  his  earlier  work  [177]  which  has  wide  application  to  a  more 
traditional  treatment  of  signal  processing  restricted  to  real  variables.  That  text 
includes  discussions  on  topics  such  as  the  Generalized  Rayleigh  distribution. 
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Rice  variates,  Whittaker  functions,  envelope  detection,  Cramer-Rao  bounds, 
Wiener- Khintchine  relations,  and  passage  of  Gaussian  noise  through  a  linear 
filter.  Miller’s  1974  book  [180]  deals  with  complex  stochastic  processes.  The 
next  book  [181]  develops  the  theory  of  hypothesis  testing  using  univariate  and 
bivariate  complex  Gaussian  variables.  It  begins  with  a  review  of  Neyman- 
Pearson  testing.  Throughout,  it  works  with  the  bivariate  complex  normal 
distribution,  CN2(c,R).  He  derives  the  bivariate  complex  Wishart  density 
function  CW^in^B)  and  references  Goodman  [92]  for  the  density  function  of 
CWp(n,i?).  He  addresses  groups  of  transformations,  functions  invariant  with 
respect  to  a  group,  and  functions  that  are  maximal  invariant.  He  observes 
that  uniformly  most  powerful  (UMP)  tests  do  not  abound,  but  sometimes  it 
is  possible  to  find  UMP  invariant  tests  with  respect  to  some  group  G.  He  also 
recommends  further  restriction  to  the  class  of  unbiased  tests. 

Compared  to  other  areas  of  statistics,  the  literature  on  statistics  of  complex 
variables  appears  sparse.  Saxena  provided  a  nice  annotated  bibliography  of 
60  references  subtending  Rice’s  1944  paper  [220]  through  1976.  The  largest 
number  of  references  in  any  one  year  was  8  in  1972.  The  early  works  are 
application  oriented.  A  short  small  spurt  of  work  began  with  Wooding ’s  1956 
paper.  Work  resumed  in  1963  which  motivated  about  7  years  of  work.  Another 
increase  in  productivity  began  in  1970  which  lasted  4  years.  No  papers  were 
published  in  1974,  one  in  1975,  and  two  in  1976,  which  was  the  leist  year 
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included  in  the  bibliography. 

The  following  references  either  are  from  the  period  1977-1991  or  are  earlier 
references  I  have  located  which  were  not  cited  in  Saxena’s  paper. 

Freedman  and  Lane  [87]  reported  in  1980  that  the  first  n  —  1  Fourier  co¬ 
efficients  of  the  discrete  Fourier  transform  of  n  independent  complex  normal 
variables  are  independent  identical  complex  normal  random  variables. 

Fang  and  Krishnaiah  [79]  published  the  asymptotic  distributions  of  func¬ 
tions  of  eigenvalues  of  the  complex  noncentral  Wishart  matrix  via  perturbation 
theory  in  1981. 

Singh  and  Pillai  [245]  reported  on  the  exact  non-null  distribution  of  Wilks’ 
Lyc  criterion  in  the  complex  case  for  testing  the  hypothesis 

H  —  cr^[(l  —  p)I  -|-  pee^] 

where  <t  >  0  and  p  are  unknown  against  the  alternative  hypothesis  of  inequal¬ 
ity.  The  vector  e  is  a  vector  of  all  ones,  =  (1,  •  •  • ,  1). 

Khatri  [138]  derived  a  test  to  determine  if  a  complex  Wishart  matrix 
could  be  a  real  Wishart  matrix.  Andersson  and  Perlman  [29]  derived  tests 
to  determine  if  a  p-dimensional  sample  complex  covariance  matrix  could  have 
come  from  a  p-dimensional  real  multivariate  distribution,  and  to  test  if  a  2p- 
dimensional  sample  real  covariance  matrix  could  be  considered  to  have  the 
structure  of  a  p-dimensional  complex  multivariate  distribution. 

B.  N.  Nagarsenker,  P.  B.  Nagarsenker,  and  Quinn  [189]  derived  an  asymp- 
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totic  expansion  of  the  non-central  distribution  of  Wilks’  statistic  for  the  com¬ 
plex  Gaussian  case.  Wilks’  A  statistic  is  given  by  A  =  where 

A  ~  CWp(n,S,0)  and  B  ~  CWp(m,  S,6)  where  6  =  A  nice  review 

of  the  life  and  works  of  Wilks,  with  insightful  comments  on  his  results,  is  found 
in  Anderson  [25]. 

Patil  et  al.  published  an  encyclopedic  dictionary  of  multivariate  distribu¬ 
tions  [205]  in  1984  which  includes  those  defined  of  the  field  of  complex  numbers. 
A  wonderful  feature  of  this  dictionary  series  is  that  is  makes  explicit  the  re¬ 
lationship  between  various  distributions.  This  is  an  excellent  entry  point  into 
the  literature  on  distributions. 


5.3  Zonal  Polynomials,  Hypergeometric  Func¬ 
tions,  Group  Representation  Theory 

The  purpose  of  this  section  is  to  review  the  development  of  zonal  polynomi¬ 
als.  Zonal  polynomials  are  the  key  to  developing  the  joint  density  function  of 
sample  eigenvalues  of  a  complex  Wishart  matrix.  The  eigenvalues  examined 
in  this  thesis  follow  that  distribution. 

The  distribution  for  the  case  of  the  real  Wishart  matrix  was  derived  by 
James  [120]  in  1964.  He  also  wrote  down  the  result  by  inspection  for  the  case 
of  a  complex  symmetric  matrix,  without  derivation. 
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As  of  1987,  zonal  polynomials  had  only  been  developed  for  the  case  of  the 
real  symmetric  matrix  and  the  two  matrix  argument  case  of  a  real  symmetric 
matrix  and  real  symmetric  positive  definite  matrix  [188].  Gross  and  Richards 
developed  zonal  polynomials  for  the  case  of  complex  Hermitian  matrices  in 
1987  [96].  A  contribution  of  this  thesis  is  the  application  of  their  work  to  the 
distribution  of  sample  eigenvalues  of  a  complex  matrix. 

The  reason  we  need  to  even  think  about  zonal  polynomials,  hypergeometric 
functions,  and  group  representation  theory  is  because  of  the  need  to  evaluate 
the  integral  /u(p)  etr(—T^~^U^AU}dU  where  the  integral  is  taken  over  the  set 
of  all  p  X  p  unitary  matrices.  The  function  etr(X)  is  a  standard  notation 
for  exp(tr(X))  in  the  literature  and  texts  dealing  with  distribution  theory  in 
multivariate  analysis. 

Zonal  polynomials  are  important  to  the  study  of  the  distribution  of  eigen¬ 
values  of  a  Wishart  matrix.  Takemura  [265]  has  recorded  a  wonderful  history 
of  the  development  through  1984,  from  which  I  have  taken  many  of  the  com¬ 
ments  made  below.  Some  of  the  works  of  James  were  briefly  described  in 
the  earlier  section  on  eigenvalue  testing,  yet  the  history  of  the  development 
of  zonal  polynomials  rests  on  these  same  works.  James  is  often  cited  as  the 
prime  ifiotivator  for  work  with  zonal  polynomials. 

In  1960,  James  published  his  paper  [1 18]  on  the  distributions  of  eigenvalues 
using  representation  theory  of  the  linear  group.  His  results  are  given  in  terms 
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of  zonal  polynomials.  Much  of  the  interest  in  zonal  polynomials  since  1964 
has  been  a  direct  result  of  the  application  to  multivariate  statistics  and  the 
paper  by  James  [120].  In  that  paper,  he  generalized  his  previous  work  and 
discussed  a  general  method  for  calculating  the  zonal  polynomial.  Until  the 
mid-1980s,  work  on  zonal  polynomials  has  been  done  primarily  by  statisticians. 
Since  then,  mathematicians  have  started  to  examine  zonal  polynomials  in  the 
context  of  more  general  structures  which  has  led  to  new  results  and  powerful 
generalizations. 

Zonal  polynomials  form  a  subset  of  spherical  functions.  They  are  homo¬ 
geneous  harmonic  polynomials  defined  on  the  surface  of  a  multidimensional 
sphere.  Zonal  polynomials  are  orthogonal  functions  on  n-dimensional  spheres. 
You  can  think  of  them  as  generalized  Legendre  polynomials.  In  3-dimensional 
space,  in  fact,  zonal  polynomials  are  directly  proportional  to  Legendre  poly¬ 
nomials  [251]. 

Another  early  worker  in  this  area  is  Constantine,  who  worked  with  James 
at  least  as  early  as  1958  [56].  In  his  1963  paper  [57],  he  worked  in  terms 
of  complex  symmetric  matrices  (not  the  same  thing  as  Hermitian  matrices), 
and  defined  the  hypergeometric  function  of  complex  matrix  argument  as  a 
function  of  zonal  polynomials  of  complex  symmetric  matrix  argument.  With 
this,  he  derived  the  density  function  of  the  noncentral  real  Wishart  matrix. 


He  also  found  the  moments  of  the  determinant  of  a  noncentral  real  Wishart 
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matrix.  In  his  1966  paper  [58],  Constantine  defined  a  generalized  Laguerre 
polynomial  of  complex  symmetric  matrix  argument  which,  in  turn,  is  defined 
in  terms  of  zonal  polynomials  of  complex  symmetric  matrix  argument.  With 
these,  he  finds  the  distribution  of  the  generalized  Hotelling’s  Tq  statistic  where 
Tq  =  tr(/l5“*).  Matrix  A  is  a  real  noncentral  Wishart  matrix  distributed  as 
A  ~  Wp{n^  E,  n)  and  real  central  Wishart  matrix  B  is  distributed  as  5  ~ 
VFp(m,  E).  When  the  multivariate  normal  distribution  underlying  A  has  mean 
vector  p,  then  the  noncentrality  parameter  0.  is  defined  by  fl  =  The 

1976  paper  by  Constantine  and  Muirhead  [59]  presents  asymptotic  expansions 
for  distributions  for  several  very  important  matrices,  including  A{A  +  B)~*, 
for  some  or  all  of  the  eigenvalues  of  H  large,  which  can  be  thought  of  as 
a  generalized  signal-to- (signal  plus  noise)  ratio  where  A  and  B  are  defined 
above.  They  also  develop  asymptotic  distributions  for  ^B  and  BC~^  where 
C  ~  lTp(A:,  E).  As  in  earlier  papers,  these  results  are  developed  in  terms  of 
hypergeometric  functions  of  matrix  argument. 

By  1982,  the  importance  of  zonal  polynomials  to  the  development  of  dis¬ 
tributions  in  multivariate  statistics  became  recognized.  Muirhead  [187]  (the 
student  of  both  James  and  Constantine)  published  his  text  which  included  a 
major  chapter  devoted  to  zonal  polynomials.  Muirhead  develops  zonal  poly¬ 
nomials  as  a  solution  to  the  partial  differential  equation 


\Z^{y)  =  [p«  -I-  kirn  -  \  )]Z^{y) 


where  Ay  is  the  differential  operator,  called  the  Laplace- Beltrami  operator, 
defined  by 

m  pa  ?/2  p 

oVi  ^i^iVi-yjOyi 

m 

where  pK  =  ki{ki  —  z)  and  k  =  {ki,---km)  such  that  k  =  k^  +  km-  It 

i=l 

has  become  traditional  to  use  the  Greek  kappa  (/c),  the  Latin  letter  k,  and 
its  subscripted  partitions  ki  even  though  the  opportunity  for  ambiguity  after 
copying  exists.  Muirhead  provides  a  recurrence  relation  for  computing  the 
coefficients  of  the  zonal  polynomials.  He  also  sketches  the  group  representation 
theory  development  of  zonal  polynomials  used  by  James. 

In  1984,  the  Institute  of  Mathematical  Statistics  (IMS)  published  Take- 
mura’s  monograph  on  the  subject.  This  was  only  the  fourth  monograph  IMS 
published  on  any  subject.  Takemura  defines  zonal  polynomials  as  symmet¬ 
ric  homogeneous  polynomials  on  the  eigenvalues  of  a  symmetric  matrix.  He 
writes  down  the  definition-and  properties  of  complex  zonal  polynomials  with¬ 
out  proof  since  the  proofs  for  the  real  and  complex  cases  are  the  same  for  his 
development.  He  remarks  that  complex  zonal  polynomials  are  simpler  than 
real  zonal  polynomials,  noting  that  the  complex  zonal  polynomials  are  the 
same  as  homogeneous  symmetric  polynomials  called  the  Schur  functions.  The 


explicit  relationship  is  given  by  Saw’s  generating  function  introduced  by  Far¬ 
rell  [80].  Takemura  shows  these  to  be  the  same  via  the  uniqueness  property 
of  the  triangular  decomposition  of  a  positive  definite  symmetric  matrix.  Note 
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that  it  is  possible  to  have  a  complex  symmetric  matrix,  which  is  different  than 
an  Hermitian  matrix.  He  also  writes  down  the  density  function  for  the  zero 
mean  complex  vector  normal  and  the  central  complex  Wishart  distributions. 
In  his  Chapter  5,  Takemura  uses  the  symbol  ~  to  denote  complexification  of  a 
theorem  established  for  the  case  of  real  variables.  Takemura  uses  that  symbol 
over  variables  to  indicate  they  apply  to  the  complex  case.  Prior  to  the  work  by 
Gross  and  Richards,  Takemura’s  development  of  complex  zonal  polynomials 
was  the  most  complete  I  have  found  in  the  literature. 

The  development  of  the  theory  of  zonal  polynomials  has  proceeded  simul¬ 
taneously  from  a  traditional  physics  and  special  functions  point  of  view  as 
represented  by  Stein  and  Weiss  [258],  and  from  a  mathematician’s  point  of 
view  as  represented  by  Gross  and  Richards  [96].  The  nicest  introduction  to 
zonal  polynomials  from  an  engineer’s  point  of  view  is  Stein  and  Weiss’  book. 
Its  work  was  done  without  reference  to  James’  work.  Stein  and  Weiss  work 
in  the  field  of  real  numbers  and  use  differentiation,  so  application  to  the  com¬ 
plex  field  must  proceed  cautiously.  They  do  not  develop  the  splitting  theorem 
needed  for  this  thesis. 

Gross  and  Richard’s  work  is  a  development  of  the  theory  of  hypergeomet¬ 
ric  functions  of  matrix  argument,  firmly  rooted  in  group  representation  theory, 
that  simultaneously  treats  the  case  of  real,  complex  Hermitian,  and  quater- 
nionic  variables.  Of  importance  to  this  work.  Gross  and  Richards  provided  a 
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development  of  the  splitting  property  for  zonal  polynomials  in  the  context  of 
complex  variables.  In  the  development,  they  assume  Hermitian  matrices  for 
the  complex  case  rather  than  complex  symmetric  matrices.  This  is  evident 
by  application  of  the  unitary  group.  It  is  more  mathematically  motivated, 
and  less  applied,  than  the  work  by  James.  Compared  to  the  work  by  Stein 
and  Weiss,  Gross  and  Richards  do  not  include  the  specification  of  a  reference 
point  on  the  sphere.  This  prevents  drawing  observations  about  coordinate 
transformations  made  clear  in  the  approach  by  Stein  and  Weiss.  Closing  the 
connections  between  these  two  works  is  a  valuable  task  that  needs  yet  to  be 
done. 

Gross  and  Richards  published  a  continuation  [97]  of  their  studies  in  1989 
which  introduces  the  concept  of  total  positivity  in  the  context  of  spherical 
series  and  hypergeometric  functions  of  matrix  argument.  They  point  out  that 
the  spherical  function  known  by  mathematicians  is  the  zonal  polynomial  known 
by  statisticians.  They  also  remark  that  up  to  scalar  multiples,  the  spherical 
functions  coincide  with  the  Schur  functions.  They  show  the  Euler  integral 

=  r  /  Z™(rt)|det(fl)l“-ldet(l  -  r)!*--"* 

1  n(^  Cjl  n{^/  [^Jm  ''0<r<I 


131 

where 

Re(a)  >  n  —  1 
Re(6  —  a)  >  n  —  1 
t  €  Sn 

r  :  Hermitian  matrices  whose  eigenvalues  are  between  0  and  1 

r„(a)  =  n  r(a  -  i  +  1) 

t=i 

as  an  example  of  a  reproducing  integral  formula.  It  would  be  good  to  look 
at  this  in  the  context  of  section  2.2  and  chapter  3  of  Fowler’s  thesis  [86]  on 
Reproducing  Kernel  Hilbert  Space  because  Krantz  [143]  showed  that  zonal 
polynomials  are  reproducing  kernels. 


Chapter  6 
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STATISTICAL  TESTS 

This  chapter  provides  distributional  results  for  test  statistics  that  examine 
sample  eigenvalues  to  gain  understanding  of  underlying  parameter  eigenvalues. 
It  is  in  this  chapter  that  the  thesis  topic  is  most  directly  addressed.  You  will 
observe  that  I  have  only  answered  special  cases  of  the  thesis  question. 

Tests  of  greatest  interest  take  a  set  of  samples  and  form  one  statistic  upon 
which  decisions  are  based.  These  make  the  most  efficient  use  of  the  data,  but 
they  also  have  sampling  distributions  that  are  very  difficult  to  compute.  A 
compromise  is  to  partition  the  data  set  into  independent  sets,  and  form  a  test 
statistic  from  these  sets.  This  approach  does  not  make  efficient  use  of  the 
data.  One  version  of  this  approach  results  in  a  test  statistic  that  is  easy  to 
compute  and  has  a  sampling  distribution  represented  by  a  function  that  is  a 
standard  function  that  statisticians  work  with. 

Several  approaches  are  presented  in  this  chapter.  The  first  approach  arbi¬ 
trarily  partitions  the  data  into  two  independent  sets  and  forms  an  F-statistic 
from  the  ratio  of  independent  sums  of  the  sample  eigenvalues.  A  second  ap¬ 
proach  partitions  the  data  into  one  block  assumed  to  be  noise-only  and  another 
partition  that  possibly  contains  a  signal.  The  result  is  the  joint  distribution 
of  the  sample  eigenvalues  of  the  signal-plus-noise  sample  covariance  matrix. 
This  is  the  form  of  Schmidt’s  MUSIC  problem  [238]. 


A  third  approach  is  to  work  with  data  transformed  into  the  form  of  = 


and  obtaining  the  distribution  of  the  statistic  x  =  f.  This  requires 

assuming  a  population  covariance  matrix  which  can  be  part  of  the  null 
hypothesis  of  a  test.  The  ratio  of  sample  eigenvalue  sums  or  averages  belong 
to  this  class. 

A  fourth  approach  has  its  origin  as  a  maximum  likelihood  ratio  test  statistic 
which  requires  only  partial  knowledge  of  the  population  covariance  matrix.  For 
the  real  variables  case,  the  asymptotic  distribution  of  this  sphericity  test  was 
derived  by  Anderson  [24].  I  have  provided  the  joint  density  of  this  statistic 
with  some  nuisance  variables.  For  the  case  of  p  =  2,  I  have  provided  the 
density  and  cumulative  distribution  function. 

The  last  approach  I  examined,  and  the  one  of  greatest  interest  in  the  general 
case,  involves  simple  transformations  of  the  sampling  distribution  of  eigenval¬ 
ues.  I  have  assumed  that  the  special  case  of  the  sample  eigenvalues  D  having 
a  joint  distribution  CVFp(n,  A^).  The  statistics  for  which  I  computed  distribu¬ 
tions  were  motivated  by  Krishnaiah’s  works  (which  will  be  referenced  in  their 
respective  discussions).  This  section  is  the  culmination  of  the  supporting  work 
in  the  appendices,  both  of  the  complex  variables  and  zonal  polynomial  theory. 
There  is  still  a  great  deal  of  work  left  to  extend  these  to  the  general  case. 


a  0 

0  b 
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6.1  Tests  Based  on  Two  Independent  Sets  of 
Samples 

6.1.1  F-Statistic  from  Ratio  of  Independent  Sums  of 
Sample  Eigenvalues 

Consider  the  following  procedure.  Assume  that  all  samples  are  statistically 
independent.  Then  it  is  possible  to  define  an  arbitrary  partition  of  the  sample 
set,  splitting  it  into  two  sets.  Form  a  statistic  within  each  of  the  partitions  and 
then  compare  the  statistics.  For  example,  let  the  statistic  in  one  set  be  the  sum 
of  the  mi  largest  sample  eigenvalues,  and  let  the  statistic  in  the  other  set  be 
the  sum  of  the  m2  smallest  sample  eigenvalues.  Because  the  two  statistics  were 
obtained  from  independent  samples,  the  statistics  themselves  are  independent. 
We  know  that  linear  combinations  of  sample  eigenvalues  yields  a  chi-square 
random  variable.  The  independence  of  these  statistics  gives  us  hope  that  an 
F-statistic  can  be  formed.  A  benefit  is  that  the  F-distribution  is  one  of  the 
most  widely  known  and  used  distributions  in  statistics.  Its  properties  have 
long  been  known. 

Recall  that  if  x\  has  the  non-central  chi-square  distribution  with  parame- 
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ters  vi  and  6i,  X2  has  the  non-central  chi-square  distribution  with  parameters 
U2  and  ^2,  and  if  xi  and  X2  are  independent,  then  y  =  {v2Xi) f  {v\X2)  has  the 
doubly  non-central  F-distribution  with  parameters  and  ^2-  The  pa¬ 

rameters  Vi  and  V2  are  usually  called  “degrees  of  freedom”. 


Theorem  6  Let  Wi  ~  CVFp(ni, Si,6i)  and  W2  ~  CW,(n2,E2,^2)  be  inde¬ 
pendent  complex  Wishart  random  variables.  Let  ci  be  a  p  x  1  vector  of  known 
fixed  constants,  and  let  C2  be  a  q  x  1  vector  of  known  constants.  Then 


F  = 


2cf^Wici  „ 

2niV£ici  _  n2<f  iyiCiC^E2C2 
“  nic^W"2C2cf  EiCi 

2n2c|^E2C2 


^2ni,2n2, 


2ciSiCi  2c2  S2C2  '\ 
cf  Sjci  ’  cf  S2C2  ) 


Proof.  By  theorem  54, 


c^WiCi  ~  CVFi(ni,cfSiCi,c{^6iCi) 


and 


C2W2C2  ~  CVFi(n2,C^S2C2,C^^2C2) 


Let  c^WiCi  and  c^VF2C2  be  positive.  This  is  satisfied  if  E]  and  E2  are  positive 
definite.  Then  by  theorem  53  we  know 

2ci^VFiCi  2  /2c2^^iCi\ 

cfEic 

and 

2cf  IT2C2  2  (  2cF<^2C2'\ 

c"E2C2  '"^*”nc"S2C2/ 


Taking  the  ratio  of  these  terms,  each  divided  by  their  respective  degrees  of 
freedom,  gives  us  the  doubly  non-central  F-distributed  random  variable  F.  □ 


136 


Patilet  al.  (pp.  142-143)  [204]  catalog  the  doubly  non-central  F-distribution 


dncF(i/i, 1/2,61,62) 


A  random  variable  x  has  the  doubly  non-central  F-distribution  with  parame¬ 
ters  Ui,  V2,6i,  and  62  if  its  probability  density  function  is 

/(^)  =  EE  -  (  — 

Vl/,X-|-//2y  \l/iX  +  l/2J 

^ - (p  (I) - ^ 

jW.B(f+j,f  +  t)  V  I  2  /; 

where  x  >  0.  The  numbers  i/i,i/2  are  positive  integers,  and  ^1,^2  ^  0.  The 

function 

—  f  —  x)‘’~^dx 

Jo 

p  >  0,  9  >  0  is  the  beta  function.  A  famous  identity  is  B{p,  7)  =  where 

r  is  the  gamma  function. 

Under  the  null  hypothesis  Hq  :  c^Y^iCi  =  c^S2C2,  the  density  function  of 
the  test  statistic  is 


2c»Wtct 

„  2ni  c»£ic,  _  n2c{HFiCiC5'E2C2  n2Cf  VFiCi 

-  2c«W,c,  -  „,c«VF2C2c{^S,C,  “  niC^lF2C2 

2n2  C2  5-2  C2 


~  dncF 


^2ni,2n2, 


2c^6xC\  2c1^62C2\ 
cf^SjCi  ’  cf  EiCi  ) 


This  test  allows  you  to  compare  special  linear  combinations  of  elements  of  the 


complex  Wishart  matrix.  It  is  particularly  useful  if  you  want  to  compare  any 
two  elements  on  the  main  diagonal  of  the  complex  Wishart  matrix.  Establish 
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a  null  hypothesis  (or  default  assumption)  that  the  two  sample  special  linear 
combinations  n2C^WiCi  and  nic^W2C2  are  really  the  same.  Form  the  test 
statistic  F.  If  the  null  hypothesis  is  rejected  at  your  chosen  a  level  of  sig¬ 
nificance,  then  you  conclude  the  alternate  hypothesis  Ha  :  ^  (^'£,2^2- 

This  test  can  be  applied  sequentially  to  discover  the  order  of  the  underlying 
system. 

Corollary  1  Let  Wi  ~  CWp(n,  Ei,6i)  and  W2  ~  ClF,(m,  £2,  <^2)  be  inde¬ 
pendent  complex  Wishart  random  variables.  Let  c\be  a  p  x  I  vector  of  known 
fixed  constants,  and  let  C2  be  a  q  x  1  vector  of  known  fixed  constants.  Let 
Wi  =  U\L\Ui  and  W2  =  U2LIU2  be  the  eigenvalue  decompositions  of  Wi 
and  W2,  respectively.  Then 

F  = 

~  dnci 

Proof.  From  theorem  53  we  know  that 
2cfL?ci  2 

and 

2c^LIc2  2  (2c»U^S2U2C2\ 

C«U’fiT.2U2C2  Vc"t/2"E2C/2C2y 

Taking  the  ratio,  each  divided  by  its  respective  degrees  of  freedom,  gives  us  a 
doubly  non-central  F-distributed  random  variable.  □ 


2ct^L\ci 

2nic»UaT,xViCi 

2c^L\c2 

2n2C»Ufi%U2C2 


n2C»  L\ciC^U»T.2U2C2 

niC2  L2C2C^U(^TiiUiCi 


2ni,2n2, 


2c"t/"6,t/lCi  2c»U^62U2C2 
cfC/f  EiC/iCi  ’  C^U^^2U2C2 
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A  common  practical  situation  is  where  Wi  and  W2  arise  out  of  a  sampling 
of  a  common  complex  vector  normal  distribution  with  Si  =  II2.  For  Wi  and 
W2  obtained  from  independent  samples,  we  know  Wi  ^  W2  with  probability 
1.  Thus  Ui  ^  C/2,  and  we  have  no  hope  of  finding  a  distribution  of  the  ratios 
independent  from  S. 

6.1.2  Density  of  Eigenvalues  of  Sample  Signal  Plus 
Noise  Covariance  Matrix  with  Respect  to  Inde¬ 
pendent  Sample  Noise- Only  Covariance  Matrix 

Theorem  7  Let  A\  ~  CVFp(m,  S)  and  ~  Ciyp(n,S)  where  m,n  >  p. 
Then  the  joint  density  of  the  unordered  roots  of  det(Ai  —  BBi)  =  0,  which  we 
sort  for  testing,  is 

f{L^)  =  p\g{L^) 

.1=1  [»<j 

where  C2  is  defined  by 

^p(p-i)Crp(m  +  n) 

-  crp(m)crp(n)crp(p) 

This  is  a  complexificntion  of  Anderson’s  theorem  13.2.2  (pp.  522-530)  [26]. 


Discussion.  In  the  context  of  signal  processing,  the  matrix  Ai  can  be  taken 
to  be  the  sample  covariance  matrix  of  a  deterministic  signal  plus  random  noise 
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measured  during  the  time  period  of  interest.  The  matrix  Bi  can  be  the  sam¬ 
ple  covariance  matrix  measured  during  a  period  when  signal  is  assumed  to  be 
absent,  but  the  noise  remains  the  same  as  when  the  signal  was  measured.  This 
theorem  then  gives  the  density  of  the  eigenvalues  of  the  sample  signal-plus- 
noise  covariance  matrix  with  respect  to  an  independently  measured  sample 
noise-only  covariance  matrix.  The  number  of  samples  taken  to  estimate  the 
covariance  matrix  are  allowed  to  be  different.  Note  that  when  Bi  is  nonsingu¬ 
lar,  then  the  result  is  also  the  joint  density  of  the  roots  of  det(v4iBf  ^  ~PI)  ~  0 
or  variations  on  det(Bj'^^^v4i5i’*^^  —  PI)  =  0.  Thus,  A\Bi^  ^  , 

Bf^AxBl^''^,  or  (depending 

on  the  factorization  theorem  you  use)  has  the  interpretation  of  a  generalized 
(signal-plus-noise)  to  noise  ratio. 

Compare  the  problem  being  treated  here  with  the  work  on  MUSIC  by 
Schmidt  [238].  For  this  theorem  to  apply,  we  need  the  population  covariance 
matrix  to  be  the  same  for  the  two  sampled  matrices  under  the  null  hypothesis. 
Deflat  3  the  sample  covariance  matrix  of  the  signal-plus-noise  by  the  eigenvalues 
thought  to  be  due  to  a  signal  component  (1?  —  /^n)-  Call  this  deflated  matrix 
Ai.  If  the  noise-only  component  truly  has  been  removed,  then  none  of  the 
eigenvalues  of  A\  should  be  Under  the  null  hypothesis  that  Ho  :  P  =  Q, 
the  Ai  here  is  the  S  =  APA^  -|-  A^5o  of  Schmidt  and  the  PB\  here  is  the 
A^n‘^0  of  Schmidt.  Note  that  no  deflation  is  required  for  the  initial  detection 
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in  absence  of  interference  problem  under  the  null  hypothesis  that  there  is  no 
signal. 

Proof.  The  proof  presented  here  parallels  the  proof  Anderson  provided  for 
the  case  of  real  variables  where  the  proper  Jacobians  have  been  substituted 
and  other  appropriate  modifications  made.  The  strategy  is  to  find  the  joint 
distribution  of  an  intermediate  matrix  E  and  the  roots  of  det[A  — /(A+jB)]  =  0 
where  /  is  a  scalar.  Then,  observe  that  E  and  F  =  diag(/i,  •  ■  •  ,/p)  are  statis¬ 
tically  independent.  Find  the  density  of  E^  and  divide  into  the  joint  density  to 
obtain  the  density  of  F.  Change  variables  from  F  to  =  diag(/j ,  •  •  • ,  /p)  to 
obtain  the  density  of  the  roots  of  det(Ai  ~PB\)  =  0.  A  difference  from  Ander¬ 
son’s  work  is  my  consideration  of  unordered  versus  ordered  eigenvalues  and  the 
process  by  which  the  sorted  eigenvalues  are  obtained.  The  algebra  is  straight 
forward,  but  the  original  choice  of  the  changes  of  variables  (which  I  copied 
from  Anderson)  that  allows  the  solution  to  be  obtained  requires  uncommon 
insight. 

Begin  with  the  general  eigenvalue  problem 

AiXi  =  PB\X\  (6-1) 

where  F  is  the  eigenvalue  and  a:i  is  the  associated  eigenvector  of  with 
respect  to  (or  in  the  metric  of)  Bj .  The  first  simplification  is  a  transformation 
to  standardize  the  covariance  to  the  identity  matrix.  Choose  matrix  C  so 
that  CSC"  =  I.  Let  A  =  CAjC"  and  B  =  C B^C^ .  By  theorem  54,  A  ~ 
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CWp(m,  /)  and  B  ~  CWp{n,I).  Also  note  that 

det(A  -  BB)  =  det(C'AiC"  -  PCB^C”)  =  det[(7(Ai  -  PBi)C^] 

=  det(C)det(A,  -  PBi)Aei{C")  =  0 

implies  det(Ai  —  PBi)  =  0  when  det(C')  ^  0.  So,  the  eigenvalues  of  A  with 
respect  to  B  are  also  the  eigenvalues  of  Ai  with  respect  to  Bi .  If  we  premultiply 
{A  —  PB)x  =  0  by  C~^  we  observe  that 

0  =  C-\A  -  PB)x  =  {C-^CAiC^  -  PC-^CB^C^)x  =  (Ai  -  PB,)C”x 

Thus  xi  =  C^x  relates  the  eigenvectors. 

Now  for  the  trick.  Consider  the  eigenvalues  {/,}i  that  satisfy 

det[A  -  f{A  +  5)]  =  0 

and  the  eigenvectors  {yi}?  satisfying 

[A-/,(A  +  B)]y,  =  0  (6.2) 

Observe  that  when  fi  ^  I  this  can  be  written  |/1  —  ibj[^  Vi  =  0-  So,  the 
eigenvalues  of  equation  6.1  are  related  by  IJ  =  y^. 

To  proceed,  establish  an  ordering  on  the  eigenvalues  {/i}i.  This  ordering 
will  determine  the  ordering  of  associated  eigenvectors  {j/i}?  to  establish  the 
matrix  Y.  As  far  as  the  derivation  is  concerned,  it  does  not  care  what  ordering 
you  choose  as  long  as  it  remains  fixed  for  the  remainder  of  the  derivation.  We 
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will  observe  from  equation  6.3  that  the  order  of  the  eigenvalues  affects  f{F) 

p 

by  looking  at  the  Jl  (/i  —  fjf  term.  This  differs  from  Anderson’s  work 
•<j 

I  have  control  over  the  ordering  by  the  algorithms  used  to  extract  eigenval¬ 
ues.  Given  any  selection  of  eigenvalues  and  associated  eigenvectors,  I  form  our 
ordered  set  by  sorting  the  eigenvalues.  Thus,  I  actually  want  to  end  up  with 
plF{f)  since  I  actually  am  concerned  with  the  case  of  the  unordered  eigenval¬ 
ues  which  are  then  sorted.  See  section  E.6.  Okamoto  [197]  shows  us  that  the 
probability  of  two  roots  being  equal  is  zero.  Define 


(  h 


\ 


and  Y  —  (j/i,  •  ■  •  j/p).  Then  equation  6.2  can  be  rewritten  as  AY  =  {AY  B)YF. 
Suppose  that  Y^{A  +  B)Y  =  I.  Then 


Y"AY  =  Y^{A  B)YF  =  F 


Multiplying  by  {Y^)  ^  and  Y  ^  we  see  A  +  B  =  Y  ^Y  *  and  A  =  Y  ^ FY 
The  next  simplification  is  to  let  E  =  Y~^ .  Then 


AyB  =  E^E^G 


A  =  E^FE 

B  =  E^E  -  E^FE  =  E^{I  -  F)E 


Now  the  known  variables  (A,  B)  are  in  terms  of  the  variable  I  want  (F)  and  a 
nuisance  variable  (F). 
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Recall  that  the  eigenvectors  are  unique,  apart  from  a  scale  factor.  The 
restriction  of 


y"(A  +  B)r  =  V"GV  =  I 


determines  F  up  to  a  phase  factor,  where  Y  =  (j/i,  •  •  • ,  j/p).  Consider 


K  =  (e‘'’>j^i,c’S2,---,e‘Sp) 


Then 


/ 


Y^GY.  = 


y»Gy, 

y2Gy2 


\ 


y  ^p'^y^Gyi  Gj/2 

Because  yf^Gyj  —  6ij,  we  know  that 


/  \ 

1  0  •••  0 


e-ii^^-^p)yHGy^ 

e-ii^^-Sp)yHGy^ 


VpGyp  j 


Y^GY,  = 


0  1  •••  0 


=  U 


v»  »  •■■'/ 

So,  each  eigenvector  can  be  multiplied  by  a  constant  phase  shift  and  still  satisfy 
its  orthogonality  relation.  From  E  =  Y~^  we  know  EY  =  I.  Let 

/  .  \ 
e“^>ci 

I 

E  = 


\  j 
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Then 

e‘K+fii)c,y,  ...  e'^»+®p)eij/p 

gi(u/2+fl,)c2Wi  •••  e'^+®'’)e2j/p 

EY,= 

^  e‘(‘*'p+®‘)cpyi  •••  e‘(‘^p+®p)epj/p 

^i{m+6i ) 

g«(<*'2+fi2) 

g«(wp+ep) 

To  make  EY^  =  /p  be  satisfied,  we  merely  choose  uji  =  —0,.  This  defines 
the  relationship  uniquely  between  E  and  Y.  However,  we  still  have  to  fix  the 
value  of  Y.  The  reason  we  have  to  fix  it  is  so  that  the  transformation  between 
{A,  B)  and  (£',  F)  is  unique.  For  this  reason,  we  choose  Uk  so  that  e'"*  e^i  >  0. 
We  can  always  do  this. 

Now  we  want  to  evaluate  the  Jacobian  of  the  transformation  J[(A,  B)  —* 
{E,F)].  Let’s  summarize  our  transformations.  A  =  E^ FE  implies 

{dA)  =  {dE^)FE  +  E^{dF)E  +  E^  F{dE) 

The  transformation  G  =  E^ E  implies 

(dG)  =  {dE'^)E  Y  E^{dE) 

Multiply  by  E~^  and  E~^  to  obtain  (dA)  and  {dG)  as  follows. 

{dA)  =  {E^)-^idA)E-'^  =  E-"{dE^)F  YdF-\-  F{dE)E-'^ 


{dG)  =  {E”)'\dG)E-^  =  E-”{dE^)  +  {dE)E~^ 
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Let  {dW)  =  {dE)E-K  Then 

(dA)  =  {dWfF  +  dF  +  FidW) 


and 

(dG)  =  (dW)^  +  (dW) 

Stringing  these  all  together,  we  find  the  joint  distribution  of  {A,B)  in  terms 
of  the  joint  distribution  of  (E,  F). 

f(A,B)  =  fiE,F)M{A,B)  ^  (A,G)] 
xJ^iiAG)  (AMMiAG)  (VT,F)] 
xJ,[{W,F)-^{E,F)] 

Evaluating  Ji ,  we  have  the  relations 


^  ^  d(G-A) 
,  dA  dA 

det 

^  d(G-A) 
\  dG  dG 


where  means  the  matrix  formed  by 


dA 
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Evaluating  J2,  we  note  that  since  A  and  A  are  functionally  independent  of 
G  and  G,  we  can  write 

J[{A,G)  ^  (i,G)]  =  J[{dA,dG)  {dA,dG)] 

=  J\dA  dA]J[dG  dG] 

Since  Ai  ~  CWp{m^  S)  we  know  Ai  =  A^  .  From  the  transformation  A  = 
CAiC^,  we  also  know  A  =  A^.  Similarly,  B  =  .  From  theorem  38, 


J[dA^dA]  =  detF; 


7-1 


and 


J[dG  -*  dG]  =  det  E 


-2p 


This  means 


J[{dAJG)^{A,G)]  =  \deiE\ 


4p 


Evaluating 


Jz[{A,G)  {W,F)\  =  J[{dA,dG)  ^  {dW,dF)]  = 


d{A,G) 


d{W,F) 


is  a  bit  trickier. 


dan  =  dfi  +  fiidwiiY  -f  fiidwn)  =  dfi  +  2fi  Re(dwn) 


=  fj{dwjiY  +  fiidwij)  ,i  <  j 


dgn  =  2Re{dwii) 
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dgij  =  {dwji)*  +  {dwij)  ,i  <  j 

Note  that  since  Y  is  not  Hermitian,  then  neither  are  E  or  dW.  However,  G,  dG, 
and  dG  are  Hermitian  by  construction.  We  separate  the  real  and  imaginary 
parts  to  compute  the  Jacobian  for  the  transformation  of  variables.  Note  that  F 
is  real.  The  subscripts  R  and  /,  to  follow  next,  refer  to  the  real  and  imaginary 
parts  of  the  variables. 

ddii  —  df{  “1“ 

ddijR  =  fj{dwjiR)  +  fi{dwijR)  ,i  <j 
ddiji  =  -fj{dwjii)  +  fiidwiji)  ,i<j 
dgu  —  ‘^{d’^ixR,') 

dgijR  =  {dwjiR)  +  {dwijR)  ,i<  j 

=  -{dwjii)  +  {dwiji)  ,i  <<  j 

To  compute  the  Jacobian  more  easily,  define  two  matrices  M  and  N,  as 
found  in  Anderson  [26](p.  527). 

/ 

filp-i 

hIp-2 

M  = 
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Note  that 


det(M- AT)  =  !!(/.  -/>) 


=  (/.  -  /2)(/>  -  A) •  •  ■  (/.  -  Mih  - /s) •  ■  •  (h  -M-  (U-i  -  M 


We  recognize  this  as  a  Vandermonde  determinant.  Graybill  [95]  p.  266.,  tells 
us  that  the  corresponding  Vandermonde  matrix  is 


1  1  1 


fp-i  fp-i  fp-i 

/l  J2  JS 


We  thus  seek  the  determinant  of  the  matrix  given  below  for  the  linear  change 
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of  variables. 

dan  dgn  ddijR  {i  <  j)  dgijR  {i  <  j)  ddiji  (i  <  j)  dgiji  (i  <  j) 


dfi 

I 

0 

0 

0 

0 

0 

dwiiR 

2F 

21 

0 

0 

0 

0 

dwijR  (i 

<j) 

0 

0 

M 

I 

0 

0 

dwijR  (* 

>j) 

0 

0 

N 

I 

0 

0 

dwiji  {i 

<j) 

0 

0 

0 

0 

M 

I 

dwiji  {i 

>j) 

0 

0 

0 

0 

-N 

-I 

The 

determinant  of  this  matrix  is 

/ 

\ 

/ 

\ 

/ 

,  \ 

I 

0 

M 

I 

M 

I 

det 

det 

det 

=  2’’  det{M—N)  det(iV- 

-M) 

^2F 

=  (_l)p(p-i)/22P[det(M  -  N)f 


where 

det(iV  -  M)  =  det[(-l)(M  -  N)]  =  (-lf(P-')/Met(M  -  N) 

The  Jacobian  is  the  absolute  value  of  this  determinant,  which  is 

J3[idA,dG)  ^  idWJF)]  =  -  f,f 

i<j 

Compare  this  with  Equation  (2.9)  of  [137]. 

Finally,  consider  the  Jacobian 


MiW^F)  ^  {E,F)]  =  MidW,dF)  ^  {dE,dF)] 
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Recall  that  (dW)  =  {dE)E~^,  and  that  E  is  not  a  matrix  with  special  struc¬ 
ture.  Therefore 

JilidW^dF)  {dE,dF)]  =  J[{dW)  (dE)]  =  |det(£;-‘)|^'’ 

by  taking  the  transpose  of  theorem  31. 

Now  put  it  all  together.  The  joint  density  f{A,  B)  of  random  variables  A 
and  B  is  given  by 

/(A,  B)  =  f{E,  FW2J3J4  =  f(E,  F)  X  1  X  |det  E\*^2’>  f[if,  -  |det  E\-^^ 

i<j 

=  /(£;,f)|d6i£f2>'n(/.-/jf 

»<i 

Thus 

J[{A,  B)->(E,  F))  =  Idet  Ef  2"  n(/i  - 

«<i 

is  the  Jacobian  of  the  transformation  A  =  E^ FE  and  B  =  E^{I  —  F)E  where 
E  IS  apxp  complex  matrix  without  special  structure  and  F  is  a p x p  diagonal 
real  matrix  where  the  ordering  of  individual  eigenvalues  is  fixed  and  arbitrary. 

Now  we  introduce  the  dependence  of  the  distribution  of  A  and  B.  Recall 
that  A  and  B  are  statistically  independent  and  A  ~  CWp{m^  I)  and  B  ~ 
CWp{n,  I).  Thus  the  joint  density  of  A  and  B  is 

B)  =  nr  Br”  etr[-(  A  +  B)] 

p(^7l  ji-'l  p\^} 

Therefore  the  joint  density  of  E  and  F  is 

= craw  -  B)Br  X 
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X  eix[-{E^  E)\T>  |det  E\^^  !!(/«  "  hf 

i<j 

Examine  the  determinants.  We  want  to  be  able  to  rewrite  g{E,F)  as 
gi{E)g2{F)  with  gi{E)  of  a  form  that  is  easy  to  integrate.  This  would  leave 
us  with  a  function  only  of  F. 

det{E^FE)  =  det(E")det(F)det(E)  =  det(F)  det(£;"E)  =  det{E^E)f[f, 

i=l 

det[F"(/  -  F)E]  =  det(/  -  F)det(F"F)  =  det(F"F)  n(l  -  /,) 

«=i 

|det(F)l'*  =  det(F"F) 

Substituting  into  g(E,  F)  we  obtain 

X  [n/r'’(i-/ir''l  iiu-ii?  m~e«e) 

.1=1  i<i 

By  the  factorization  theorem,  we  know  that  E  and  F  are  independent.  Notice 
that  det(F^F)  and  etr(— are  “generalized  even”  functions  of  F  (see 
definition  84). 

If  we  had  not  restricted  e,i  >  0  then  we  would  recognize  that 

J  [det(F"F)]'"+”-P^  etr(-F"F)(dF) 

is  the  expected  value  of  [det(F^F)]”*'*'”~P  when  F  is  distributed  by  CNp,p(0,  7, 1). 
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Consider 


E^E  = 


/  \ 
e“^»Ci 


-L 

Ci  Cl  T- 


+  e"e 


Thus  E^E  is  invariant  as  a  function  of  {iOk}i.  Therefore,  when 


[det(£;"^)]“etr(-f;"£) 


is  integrated  over  E  where  we  restrict  a;*  so  that  e‘‘^*  e^i  >  0,  we  get  the  same 
answer  as  when  we  integrate 

(i)’’[del(£'"£)JVtr(-£»£:) 

without  restriction  on  ujfc.  For  each  k,  we  observe 

2jr 

J  dujk  =  27r,  1  <  fc  <  p 
0 

Thus  we  know  to  consider 

Qi)'’  (i)  etT[-E''E)[dE) 

Now,  since  we  want  to  evaluate  this  integral,  we  consider  E  as  being  dis¬ 
tributed  as  CNp^p{0,  /,  I).  When  this  is  true,  we  know  from  the  definition  of 
the  complex  Wishart  distribution  that 

G  =  E^ E  CWp{p,  I) 
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By  theorem  79,  we  know  that 

g(|gp+„-p}  ^  Cr^(m+^n) 

Thus  the  integral  above  gives  us  the  scaled  expected  value 

+  n)  2-P7rP(P-i)Crp(m  +  n) 

crviS  ■  civb) 

When  we  consider  the  integration  when  e,i  is  not  restricted,  we  find 
r  2-P7rP(p-i)[det(£;"^)]’”+”-P7r-P*  etT{-E^ E){dE) 

J  2-P7rP(p-»)Crp(m  +  n)/Crp(p) 

=  J  - <r.»cr,(m  +  n) - =  J 

fi{E)  is  a  density  function  when  en  is  not  restricted.  We  want  the  density  of 
E  when  e,i  >  0.  Recall  that  since 


[det(^"£;)]“etr(-£;"^) 


is  a  generalized  even  function,  we  multiplied  our  function  by  and  ex¬ 

tended  the  region  of  integration.  To  recover  the  desired  density  function,  we 
want  f{E)  =  (27r)P/i(£).  Thus 


f{E)  = 


2PCrp(p)[det(£:«£;)]'"+"-P  etv{-E^E) 

7rP(p-i)Crp(m  -I-  n)  ^  ^ 


is  the  density  of  E  when  e^i  >0. 

Since  E  and  F  are  statistically  independent,  we  find  /(F)  by  dividing  the 
joint  density  g{E,F)  by  /(F)  as  follows. 


/(n  = 


9{E,F) 

f{E) 
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;;5Jf^^Idet(£;«B))"+-Petr(-£;«£) 

n(/‘  -  /i)" 

.»<■!' 

Simplifying,  we  get 


Li=i 


/(n  = 


;rP(p-i)crp(m  +  n) 

crp(m)crp(n)crp(p) 


n/r"(i-/.r" 

i=l 


IK/i  -  f,f 


i<] 


(6.3) 


where  1  >  /,  >  0  for  each  i  since  /  =  in  the  original  problem  and  the 
ordering  of  the  {/,}  is  as  fixed  earlier  in  the  derivation.  Strict  inequality  is 
specified  because  Okamoto  [197]  showed  that  the  probability  of  two  sample 
eigenvalues  being  equal  is  zero. 

The  density  of  If  is  obtained  from  /(F)  using 


f  =  — -i — 

1  +  If 


Following  Anderson  (p.  530)  [26],  we  note  that; 


df, 

dll 


=  [(i+/n-/n(i+/?)-^  = 


1 


Thus 


J[F  L^] 


=  n(rby 


We  also  note  that 


(/.  -  = 


If 


i+if  1+/2J 


if-i] 


{\  +  if){i  +  i]) 


2 


2 


and 


1  -  -  1  - 


If 

I  +  ff 


1 


I  +if 


Not*’  that 


»'  I'i  _  p 

ff - J - i 


(I  n'd  + 
'  .=1 


l*<’rfoni)iiig  tlj<!  «  haiig<’  of  varial>h’s,  wo  g«’t 


r/ 


V  / 

n 

ml  \ 

1 

(I  n(n  /f)^' 


1-  ::  I 


n(n'J 


A  1 

n('.' 


At  this  |>(»irit  w«’  note  that  2p  —  iii  -  ii  -  2  -  2i  2(j>  ~  i  -  I  )  -  (ro  f  ii).  I'hrii 


f' 


n . . 

ml 


IT":' 


Ull.'l 


j^rir  I  II ) 

Cl  ,.(iii)('l  ,,{ii)( '\  i.ipj 


wIkto 
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n  (ij  - 

x[(i+/;r]  — 

n(i  + 

.«=i 

The  final  form,  f(L^)  =  p\g(L‘^),  accounts  for  the  procedure  of  sorting  the 
randomly  selected  unordered  eigenvalues.  This  final  form  is  the  density  func¬ 
tion  for  the  set  of  sorted  sample  eigenvalues.  This  is  the  appropriate  density 
for  deriving  the  density  of  test  statistics  based  on  these  sample  eigenvalues. 
This  is  the  starting  point  for  determining  the  number  of  significant  sources  for 
the  MUSIC  algorithm. 

6.2  Tests  Based  on  One  Set  of  Samples 

In  general,  the  distribution  of  a  test  statistic  formed  from  dependent  random 
variables  will  be  more  complicated  to  evaluate  than  the  distribution  of  a  test 
statistic  formed  from  independent  random  variables.  When  the  random  vari¬ 
ables  are  dependent,  then  you  must  know  or  assume  information  about  that 
dependency. 

6.2.1  Tests  that  Require  Specifying  the  Population  Co- 
variance  Matrix 

In  this  section,  I  am  interested  in  finding  the  density  function  of  the  ratio  of 
linear  combinations  of  sample  eigenvalues  of  a  p  x  p  complex  Wishart  matrix. 


-  idL^) 


The  test  considered  here  requires  that  the  population  covariance  matrix  S  be 
specified.  This  differs  from  a  test  where  E  might  cancel  out  in  the  forming  of 


the  density  function  of  the  test  statistic.  As  an  intermediate  result,  consider 


W  ~  CW2{n,  E)  where  =  I  1 .  Then,  apply  the  expression  for  the 

density  function  for  the  complex  Wishart  distribution. 


Theorem  8  Let  W  ~  CW2{n,E)  where  W  = 


X  =  f.  Then  the  density  function  of  x  is  given  by 


and  n  >  1.  Let 


_  (5^11^22  -  |Si2|^) 

^  "  7r^[(n-l),(n-2)]  (XE22  + 


where  /3{»,  •)  is  the  beta  function. 


Proof.  We  know  that  the  density  f(W)  is  given  by 


Eli  S12 


E21  E2 


Ell  S12 


E21  E22 


cr2{n) 


When  we  evaluate  the  determinants,  we  obtain  Equation  6.6. 


\ab\  ^etr  (£,,£22-£2iEiJ 


f{W)  = 


E22  — S12 


— E21  Ell 


(E„E22-S2,S,2)"Cr2(n) 
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061”  ^etr 


-( 


1 


211E22-S21S12 


) 


0S22  — 6S12  ^ 


— aS 


21 


6E 


11 


/J 


(E11S22  —  E2iEi2)"Cr2(n) 

(SnE22  —  E2iSi2)”Cr2(n) 

Notice  that  since  S  =  E^,  we  have 


(6.7) 

(6.8) 


f{W) 


=  /(o.*) 


(6.9) 


(EnE22-|Ei2|'*)”cr2(n) 

We  can  make  the  change  of  variables  x  =  f  and  y  =  b.  The  inverse  relations 
are  given  hy  b  =  y  and  a  =  xy.  The  Jacobian  of  the  transformation  is  given 

by 


da  da 

f  ^  a  \ 

oxy  oxy 

J  =  det 

dx  dy 

=  det 

dx  dy 

db  db 

dx  dx 

\  dx  dy  J 

\  9y  / 

=  det 


/  \ 
y  X 

0  1 


y 


Then 


/(a,  6)  =  f{xy,y) \J\  =  g{x,y)  =  g  ^^,6^ 


|y| 


(E„E22-|Si2|')"cr2(n) 


(6.10) 


(6.11) 


(E„E22-|E,2iycr2(n) 

Now  integrate  out  y  to  find  g{f)-  Since  6  >  0,  we  know  that  0  <  ^  <  00. 
Concentrating  just  on  the  function  of  y,  evaluate  the  integral 


1=1  ^exp{-hy)dy 

JY 


(6.12) 
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where  /i  is  a  function  of  x  which  is  held  constant  during  the  integration,  given 

by 


h  = 


XYI22  d"  S 


11 


S11S22  —  IS12I 


By  corollary  48  we  know 


^00 

Jo 


This  gives  us  oui  expression  for  g{x),  where  we  recall  that  the  beta  function 
is  ^(m,n)  =  and  x  >  0. 


ixr-^ 

'  (2n  -  3)!  1 

(  xEji+Eii  ^ 

|-2(n-l) 

lEaEj2-|E,2p/^ 

1 

(S„S22-1Si2|') 

rcr2(n) 

|x|"~^  (2n  -  3)!  (S11S22  -  |Si2|^)  ^ 

(E11S22  -  |Si2iy  7r(n  -  l)!(n  -  2)!  (XS22  + 

_  (S11S22  — 1^12!^) 

“  7r/?[(n- l),(n-2)]  (xE22  +  Snf^ 


Theorem  9  Let  W  ~  CVk2(n,  E)  where  W  = 


a  0 


\ 


v"  '■Z 


,  a  >  0,  6  >  0,  and 


n  >  1.  Let  X  =  T.  Then  the  cumulative  distribution  function  is  given  by 


/oo 

g(x)dx 

(S11E22  -  |Si2|^)  /  1  \  ^  /  1  \  (-Sii)*" 

7r^[(n  -  1)  (n  -  2)]  \n  +  k-\)  (CS22  +  S„ 


This  theorem  is  supplied  by  me. 
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Proof.  By  theorem  8  the  density  function  is 

-  |S,;iy  1x1"-^ 

7r^[(n-l)(n-2)]  (XS22  +  Sn 

Since  a  >  0  and  6  >  0,  we  know  x  is  real  and  positive.  This  permits  us  to 
drop  the  absolute  values  signs.  Then,  apply  theorem  144.  We  note  that  since 
n  >  1  we  can  use  the  solution  for  A:  <  p  —  1  in  that  theorem  statement.  Then 


(xE22  +  E„f 


= 1 1  ~k ')  (ra) 


=  (±.Y~'  V  h  ^  ]  (-^n)" 

\S22/  ^o\  ^  j  Vn  +  A:  -  1/  (cS22  + 


The  full  answer  is 


F{x)  =  Pr{a:  >  c] 

(SnE22  -  |Si2|^)  /  1  ^ “  2"^  /  1  \  (-5^11)* 

7r/3[(n-l)(n-2)]  VE22/  to\  j  U  +  A:  -  J  (CE22  +  Sii 


Corollary  2  Let  W  =  UL^U^  be  the  eigenvalue  decomposition  0/  VP  ~ 
CVPp(n,E,  ^).  Then  U^WU  =  is  distributed  according  to 

CWj,{n,U'^m,U^SU) 
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Note  that  U  consists  of  the  eigenvectors  of  W ,  which  generally  are  not 
the  eigenvectors  of  S.  Thus  S  is  not  diagonalized  by  this  transformation. 
CWp{n,  U^SU)  is  the  distribution  of  the  sample  eigenvalues  oi  W  = 

X^X  where  X  has  the  complex  matrix  normal  distribution  CA^„,p(^, /,  S). 
This  is  an  application  of  theorem  54. 


Corollary  3  Suppose  that  we  define  a  p  x2  matrix  C  =  (ci,C2)  such  that  the 
Hadamard  product  ci  <2)  C2  =  0.  Then  look  at  L^C .  Suppose 


(  \ 
1  0 


1  0 
<^  =  I  0  0 


Then 


0  1 


=  (Cl,C2). 


C^L'^C  = 


^  II  0 


V 


(6.13) 


0  li  +  II  , 

This  is  now  distributed  according  to  the  second  order  complex  Wishart  distri¬ 
bution, 


~  CW2(n,C'^U"mC,C‘W‘^6UC) 


-'Hi  till 


iHttH 


If  we  know  that  the  mean  of  the  complex  multivariate  normal  distribution  is 
zero  then  6  =  0,  and  the  third  term  in  the  distribution  notation  is  omitted. 
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Let  a  =  L^c\  =  /1  +  /2  and  b  =  L^C2  =  Then  Equation  6.4  is  the 

probability  density  function  of  the  ratio  of  two  disjoint  linear  combinations  of 
eigenvalues  of  the  sample  covariance  matrix  where  the  underlying  data  sample 
of  size  n  is  distributed  according  to  the  zero-mean  vector  complex  normal 
distribution  CiVp(0,E).  The  subscripted  values  Sn,  S12,  and  E22  refer  to 
the  partitions  of  C^U^TiUC  and  not  to  partitions  of  the  original  population 
covariance  matrix  E. 

Although  this  density  function  is  for  a  simple  test  statistic 

C^L^C2 

interpreting  the  statistic  is  not  as  simple  as  a  modification  of  this  statistic. 
Instead,  consider  the  average  of  sample  eigenvalues  that  make  up  a  and  b.  Let 
mi  be  the  number  of  sample  eigenvalues  picked  by  c^L^Ci,  and  let  m2  be  the 
number  of  sample  eigenvalues  picked  by  c^L^c^.  Look  at  the  test  statistic 

rp  _  _  mjC^L'^ci 

^cf^^C2  mic^L^C2 

When  all  the  sample  eigenvalues  are  equal  then  the  test  statistic  Tn  =  1.  The 
further  T14  is  away  from  1,  the  averages  of  the  sets  of  sample  eigenvalues  are 
more  different.  Thus,  when  Tu  is  very  close  to  1,  expect  that  saying  “the 
corresponding  population  eigenvalues  are  all  equal”  to  have  a  small  chance  of 
being  in  error. 

So  that  the  computation  of  the  density  of  the  test  statistic  is  not  altered, 
account  for  the  weighting  done  in  the  averaging  process  in  the  vectors  Ci  and 
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cj.  Define  a  p  x  2  matrix  C  =  (cj,  C2)  such  that  the  elements  of  cjt  pick  out  the 
sample  eigenvalues  of  interest,  and  the  nonzero  entries  are  the  reciprocal  of  the 
number  of  sample  eigenvalues  extracted.  Let  the  sets  of  sample  eigenvalues 
chosen  be  disjoint.  Then  the  Hadamard  product  ci  ®C2  =  0.  Look  at  C^L^C. 
Suppose 


C  = 


vMT 

1 


0 

0 


0 

0 

0 

1 

s/mi 

s/ini  ) 


=  (Cl,C2). 


(6.14) 


Then 


C^L-^C 


{l\  +  iDim, 
0 


0 

[ll  +  /I)/m2 


\ 

/ 


H  T  "i 

For  this  simple  example,  mi  =  m2  =  2.  Then  T14  =  • 

Now  we  know  x  >  1  since  A,„,  >  Ap_,„j+i.  We  are  interested  in  testing 
if  X  is  significantly  greater  than  1.  Let  c  >  1  be  some  critical  threshold  we 
want  to  test  against.  If  c  is  a  detection  threshold,  then  this  is  the  probability 
of  detection  for  a  signal-to-noise  ratio  of  x  in  “linear”  units.  For  SNR  = 
d  =  10 log  X,  then  x  =  10'^/*°  =  for  SNR  given  in  dB.  SNR  here  is 

interpreted  in  the  sense  of  [49]  with  noise  me^lsured  in  the  same  bandwidth 
signal  is  measured  in. 


6.2.2  F*  in  MUSIC 
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In  this  section,  an  statistic  with  an  F-distribution  is  derived  for  examining  the 
decomposition  of  a  received  signal-plus-noise  plus  noise  data  set  as  constructed 
with  the  Multiple  Signal  Classification  (MUSIC)  technique.  The  motivation 
for  this  is  the  work  by  Schmidt  (1986)  [238]  and  Wax  (1991)  [282].  I  will  draw 
most  heavily  from  the  second  paper  to  develop  the  assumptions.  All  of  the 
distributional  work  is  provided  by  me. 

Let  a{6)  be  a  p  X  1  steering  vector  for  an  array  of  p  sensors  in  an  array 
having  a  fixed  arbitrary  geometry.  Assume  that  the  signals  from  q  sources 
arrive  at  the  array.  Each  source  s,  is  coherently  processed  by  a  corresponding 
linear  beamforming  function  a{0i).  Assume  that  the  stochastic  signals  are 
independent  from  the  noise  received  at  each  sensor.  This  is  the  signal-aligned 
beamformer  case  of  Monzingo  and  Miller  [185]. 

Let 

^[?1  (0)  =  [a(^i ).  «(^2),  •  •  • ,  a(0,)] 

Then  (0)  is  a  deterministic  p  x  q  complex  matrix  whose  column  vectors 
span  the  vector  space  which  contains  the  signals.  Note  that  some  of  the  noise  is 
also  in  this  space.  Let  s{t)  be  a  9  x  I  vector  of  the  signals  at  the  array  reference 
point  at  time  t.  Let  n{t)  be  a  p  x  1  vector  of  the  random  noise  appearing  at 
time  t  at  each  sensor.  Then,  let  the  beamformer  output  for  signals  arriving  at 
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the  array  reference  point  at  time  t  be  given  by 

a:(<)  =  (0)  5(<)  +  n{t) 

Obtain  independent  samples  from  the  sensors  at  M  different  times, 

-  ■  ■  itM) 


Let 

SqxM  [^(^1  /?■■■»  •S(^M )] 

NpyM  = 

-^pxM  ‘  ■  i 

Then 


X  =  /li,j(0)S  +  iV 


To  complete  the  problem  description,  we  need  to  know  something  about 
the  distributions  of  S  and  N.  Let  the  noise  matrix  N  be  distributed  accord¬ 


ing  to  the  matrix  complex  normal  distribution  having  a  mean  of  zero  and  row 
covariance  S.  Thus  N  ~  CA'pxM  (0,  hi)-  Let  the  signal  matrix  S  be  dis¬ 
tributed  according  to  the  matrix  complex  normal  distribution  having  a  m.ean 
of  zero  and  row  covariance  R.  This  is  stated  as  5  ~  CNg^m  (0,  RqxqlM)-  By 
theorem  41,  we  know 


^[q]  (^)pxq  ~  CNp,M  (o,  ARA^,  Im) 


We  sum  the  independent  random  variables  according  to  theorem  48  to  get 
X  =  /![,]  (0)  5  +  TV  ~  CNpM  (O  +  0,  S  -F  ARA^,  1m  +  Im) 


=  CNp,M  +  ARA^, 21  m) 
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Note  that  the  presence  of  the  scalar  2  came  from  the  sum  of  the  two  column 
covariance  matrices  which  each  are  Im-  By  lemma  13  we  know  that  the  row 
and  column  covariance  matrices  are  not  unique.  This  is  good  here  because 
scalar  multiples  commute  between  these  matrices.  Therefore, 

X  ~  CiVp.M  (0,2(E  +  ARA^IIm) 

We  will  need  the  column  covariance  matrix  to  be  an  identity  matrix  to  form 
a  complex  Wishart  distributed  random  variable. 

The  next  step  in  MUSIC  is  to  find  an  orthonormal  basis  for  the  space 
spanned  by  the  beamformer  when  adjusted  to  coherently  process  signals  with 
parameters  Oi,  -  •  •  ,0q  which  we  usually  associate  with  direction  (but  this  asso¬ 
ciation  does  not  strictly  have  to  hold).  We  find  the  required  orthonormal  basis 
by  performing  a  QR  decomposition  of  y4(,]  (0).  Recall  that  Q  is  the  orthonor¬ 
mal  matrix  obtainable  by  the  inner  product  /ersion  of  the  Gram-Schmidt 
process.  Because  the  symbol  R  is  alre  .dy  in  use,  let  the  triangular  matrix 
factor  from  the  QR  decomposition  be  T.  They  by  proposition  67,  A  =  QT 
where  Q^Q  =  Iq  and  T  is  an  upper  triangular  q  x  q  matrix  with  positive 
real  elements  on  the  diagonal.  Alternately,  we  can  apply  proposition  71  to  get 
A  =  QT  where  T  is  a  lower  triangular  q  x  q  matrix. 

The  matrix  Q  is  called  subunitary,  and  it  forms  an  orthonormal  basis  for  the 
space  spanned  by  the  columns  of  the  signal-directed  beamformer  A[,]  (0).  We 
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can  continue  to  construct  vectors  orthonormal  to  Q  and  mutually  orthonormal 
to  each  other  until  we  have  a  set  of  p  vectors.  These  last  (p  —  q)  vectors  form 
an  orthonormal  basis  for  the  space  orthogonal  to  the  space  spanned  by  the 
columns  of  A.  Then  we  have  the  p  x  p  orthonormal  matrix 


G  =  [Q,  V] 


This  matrix  G  is  special.  Observe  that 


Y  — 


v,xp 

Y 

X 

II 

= 

'^(p-(i)Xp 

Z 

This  has  partitioned  the  rows  of  X  into  disjoint  matrices  Y  and  Z.  Let  us 
examine  these  more  closely,  recalling  that  X  =  AS  +  N. 

Since  Q  spans  the  same  space  spanned  by  A,  then  all  of  the  signal  compo¬ 
nent  lies  in  the  space  spanned  by  Q  and  none  of  the  signal  component  lies  in 
the  space  spanned  by  V.  All  of  the  data  in  the  space  spanned  by  V  consists 
of  only  noise.  It  is  in  this  sense  that  the  space  spanned  by  Q  is  called  the 
“signal  subspace”,  and  the  space  spanned  by  V  is  called  the  “noise  subspace”. 
Caution:  these  designations  are  basically  useful  tags,  but  you  must  remember 
that  some  of  the  noise  is  also  in  the  space  spanned  by  Q.  Not  all  of  the  noise 
is  in  the  space  spanned  by  V.  This  is  consistent  with  hearing  noise  when  we 
listen  to  a  beamformer  output. 

Let  us  find  the  distributions  for  Y  and  Z.  Once  again,  apply  theorem  41 


to  get 
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G^X  ~  CN^m  (o,  2G"(S  +  ARA^)G,  Im) 
Take  a  closer  look  at  the  column  covariance  matrix. 


2G"(S  +  ARA”)G  =  2 


‘  Q«  '' 


=  2 


(5"(E  +  ARA»)Q  Q»{E  +  ARA»)V 


) 


+  ARAf^)Q  ARA^)V 

\ 

Since  all  of  the  signal  is  projected  by  Q  onto  the  space  spanned  by  A,  we  know 
that  =  0  and  A^V  =  0.  Thus  we  get 


2G"(S  +  ARA”)G  =  2 


/ 


0 

\ 

where  +  ARA^)pxpQpxq  is  q  x  q  and  V'(^_,)xpSpxpTpx(p_,)  is  {p-q)x 

(p  —  q).  Therefore 

\  \ 

7  Im 


( \ 

(  ( 

Y 

X  = 

~  GNp^M 

0,2 

1  \ 

g"(E  +  ARA»)Q  0 

0  V"EV 

By  theorem  43,  Y  and  Z  are  independent.  Since 


J 


Iq  0qx(p-?) 


we  again  apply  theorem  41  to  show 


^  nxM 

^  ^{p-q)xM 


—  YqxM 


Y  ~  CiV,,M  (0,2Q"(E  +  ARA^)Q,  Im) 
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Similarly, 

^  ®(P-9)X(J  ^p-q 

is  distributed  as 


^xAf 


^  —  ^(p-q)xM 


\  ^{p-q)^M  J 


Z~CAr(p_,).M  (0,2V"SV,/m) 


To  form  complex  Wishart  random  variables,  the  underlying  matrix  complex 
random  variables  must  have  independent  rows.  Apply  corollary  9  to  show 

K"  ~  CNM,q  (o,/m,2(5"(E  +  ARA»)Q) 


r^CNM,(p.q)  (o,/m,2V"SV) 

By  definition  6  of  the  complex  Wishart  distribution, 

Wy  =  YY^  ~  CW,  (m,2Q"(E  +  ARA")Q) 

Wz  =  ZZ^  ~  CWp-,  [M,2V^EV) 

Note  that  since  Y  and  Z  are  independent,  we  know  Wy  and  Wz  are  indepen¬ 
dent. 

We  now  can  take  the  ratio  of  arbitrary  quadratic  forms  to  obtain  an  F- 
distributed  statistic.  Apply  theorem  6  where  we  observe  that  Wy  and  Wz  are 
from  central  complex  Wishart  distributions.  The  noncentrality  parameters 
and  S2  of  the  theorem  are  zero  matrices.  Thus,  we  obtain  the  ordinary  F- 
distribution,  without  the  complications  of  noncentrality.  You  can  define  C\ 
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and  C2  in  the  manner  done  in  the  previous  theorem.  Thus,  the  statistic 

MC^WyCiC!»2V»Y,VC2 
~  MC^WzC2C^2Q»{Y.  +  ARA»)QC, 

Cl^WyCiC^V^'IlVCi 

~  C^WzC2C»QH{E  +  ARA»)QCt 

is  distributed  as 


~  dncF(2M.2M,0,0)  =  F{2M,2M) 


If  the  noise  is  modeled  with  E  =  then 

C(^WYCra^\\C2f 
C^WzC2C^Q»{a^I  +  ARA»)QCx 

^ _ C^WYCra^\\C2\\^ _ 

C!^WzC2  [<t2  ||Ci||2  +  C»Q»ARA»QC,\ 

If  you  require  ||Ci((^  =  1  and  ||C'2([^  =  1  this  further  simplifies  to 

_ 

C^WzC2  [<T^  +  Ct^Qf^ARAffQCi] 


Under  the  hypothesis  that  9  =  0,  then  Wy  =  0,  i?  =  0  and  F  =  0,  a  useless 
triviality.  Under  the  hypothesis  that  9=1,  then  Wy  and  R  are  scalars.  We 
get 


F  = 


Wy<T^ 


C^WzC2[<r'^  +  R\\Q^Af 

If  ||a(^)||  =  1  then  Q  =  A  and  this  further  simplifies  to 


F(2M,2M) 


F  = 


Wya^ 


C^WzC2  [(T^  +  R] 


F{2M,2M) 


□ 


172 


6.2.3  Test  that  Requires  Only  Partial  Knowledge  of 
the  Covariance  Matrix 

The  real  variables  version  of  following  test  statistic  appears  as  Theorem  3.2.20 
in  [187]  and  it  is  nearly  the  same  as  the  sphericity  test  given  by  Anderson.  The 
form  of  the  density  is  tedious  to  compute,  and  I  have  evaluated  it  completely 
only  for  the  bivariate  case. 


Independence  of  Sphericity  Test  Statistic  and  the  Trace  Function 

Theorem  10  Let  A  ~  CWp{n,\^Ip)  where  n  >  p  is  an  integer.  Then  u  = 
[^tr  a]*’  V  =  it  A  are  independent.  This  is  a  complexification  of  Muirhead 
[187]  theorem  3.2.20. 


Proof.  Let  D  =  diag(/j,  ■ '  •  Jl)  contain  the  eigenvalues  of  A,  and  then  by 


corollary  21 


dFiD)  = 


^p(p-))  exp 


cr,(n)cr,(p) 


n  IK'?  -  ‘^ndD) 


Change  variables  from  (Ij,---, Ip)  to  (»/, yi, •  •  • , j/p-i )  given  by  //  =  i  if  = 

^  t=l 
P 

-  tr  A,  and  y,-  =  if /y  for  1  <  t  <  p.  Note  that  Vi  =  P-  Then 


det  A  ^f  -rr 

«=  rT  =2^-  =  lly« 

itrA]  i=i  V  i=i 


Note  that  u  is  bounded  on  the  closed  interval  [0,  Ij. 


3 


Note  that  If  >  0  are  all  real  numbers,  which  affects  the  form  of  the  Ja¬ 
cobian.  Unlike  many  other  changes  of  variables  in  this  thesis  where  Jaco- 
bians  needed  to  account  for  complex  variables,  here  we  can  use  Jacobians  we 
have  computed  for  real  variables.  First,  change  variables  from  (If,  •  ■  ■  ,If,)  to 
(fj,  ■  •  • ,  /p_i,  »?)•  The  transformation  matrix  is  the  familiar 

/  \ 

0 

Ip-t  0 

0 

1  i  ...  1  1 

\  p  p  ■  p  p  / 

which  has  determinant  -.  The  Jacobian  is 

p 

The  first  step  in  our  change  of  variables  is  given  below  for 

7rP(p-i) 

^  "  A2p«crp(n)crp(p) 


dF{ll---JU,v)  =  dF(D,) 


=  Cexpi—^Tj) 


n 

i<j<P 


p-1 

n 

1=1 


p-i 

i=\ 


n-p 


n  ( + 


1=1 


E'?) 


PidDr) 


We  obtain 
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The  transformation  matrix  from  (Ij,  •  •  • ,  /p_j,  r/)  to  (t/i,  •  •  • ,  yp_i,  ?;)  is 


/ 


which  has  determinant  .  The  Jacobian  is 


TfP  1  =  J[(/2,  .  . . ,  ^  yp-1,  /?)] 


Thus 


dF(yi,...,yp_x,»7)  =  dF(V) 


=  Cexp(—^T/) 


n(w)"~'’ 


t=i 


p-i 

-  21  »7y/ 

«=i 


n-p 


n  {m-nyi? 

i<i<P 


X 


P-1  /  P-1  \ 


n^-^v{dY) 


Now  factor  the  joint  density  into  a  form  having  a  term  that  is  a  function 


of  only  T/. 

dF{Y)  =  C  y("-P)(P-l)y"-Py2(p-l)(p-2)/2^2(p-l)^p-lp 


p-l 

p-l 

n-p 

- 

p-l  /  p-l  \  ^ 

n  yr” 

P  -  21 

n  (Pi  “  Pi)^ 

n  p<-p+2ZPi 

i=l 

«=I 

.•<i<p 

,=i  V 

Collecting  powers  of  y  gives  us 


(dY) 


dF(Y)  = 


rii'”-’' 


.i=l 


ri(P'  - 

}<j 


{dY) 
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where  Vp  =  p  —  Vi  is  used  for  shorthand  notation  and  is  not  part  of  the 

t=i 

change  of  variables,  and  the  exponent  of  rj  is  computed  from 

(n  -  p)(p  -  1)  +  (n  -  p)  +  (p  -  l)(p  -  2)  +  3(p  -  1) 

=  p(n  -p)  +  {p-  l)(p  +  l)  =  np-p^  +  p^  +  p-  p-  l  =  np-l 

By  the  Neyman-Fisher  Factorization  Theorem,  we  see  that  p  is  independent 
of  (t/i,  •  •  •  ,t/p_i).  Pp  is  a  function  only  of  (pi,  •  •  •  ,Pp-i),  and  u  is  a  function  only 
of  (pi,  •  •  • ,  pp).  The  variable  u  =  tr  A  is  a  function  only  of  p.  Therefore  u  and 
V  are  statistically  independent,  which  proves  the  theorem.  □ 

The  statistic  u  =  is  used  to  test  the  hypothesis  Hq  :'E  =  Ip  versus 

the  alternative  hypothesis  Ha  :  ^  ^  I^^Ip  for  some  fixed  (but  not  necessarily 
known)  When  Hq  is  true,  the  cumulative  distribution  function  is  given 
by  F„(x)  which  depends  also  on  the  parameters  p  and  n.  When  the  sample 
eigenvalues  are  equal,  then  u  =  1,  which  is  the  maximum  value  of  u.  We  know 
u  ^  1  by  Hardy  (p.  17,  Theorem  9)  [102].  We  know  from  Okamoto  [197]  that 
the  sample  eigenvalues  will  all  be  different  with  probability  1.  So,  the  smaller 
u  is,  the  more  likely  S  =  A^/p  is  not  true.  We  want  to  choose  a  value  x  so 
that  when  u  <  x  we  can  decide  to  reject  Ho  ;  S  =  A^/p  with  a  probability  of 
rejecting  Hq  when  Hq  is  true  being  less  than  a.  Thus,  we  choose  x  so  that 

Pr(«  <  X  I  Ho  :  S  =  AVp)  =  a  =  F„(x) 


This  is  a  one-sided  test. 
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To  obtain  the  marginal  density  dF{yi,  •  •  • ,  j/p-i),  integrate  tj  out.  Examin¬ 
ing  only  those  terms  that  contain  evaluate 

^  =  Jo  exp(-^??)7/”P"V7y 

In  the  integral  /o°°  e~°‘^x"'dx  given  in  corollary  48,  let  a  =  and  m  =  np  —  1. 
Then 

1=  f-i  "(np-l)! 


Pj 


Thus  for 


C,  =  CIp  = 


(np  —  l)!7r'’lP 

pnp-icrp(n)crp(p) 


then 


>=1 


n(pt  -  vj)^ 

i<j 


{d{yu  -  ■  ■  .Vp-i)) 


Because  (pi,  •  •  • ,  t/p-i)  is  independent  of  p,  we  find 

dF{yi,---,yj>-up) 


dF{ri)  = 


dF{ytr--  ,yp-i) 


Cpexp(-^7/)77"P  *  [.n  2^*" 


UiVi-yj? 

'<j 


CpI 


u=i 


fl  j/i”"'’  fl,(y.  -  VjY 


L•<J 


dr] 


(v) 


np 


This  is  the  probability  density  function  of  the  average  of  the  sample  eigenvalues 
of  yl  ~  CWp{n,X'^Ip).  To  find  dF{tr  A),  let  a:  =  tr/I  =  py  be  a  change  of 
variables. 


dFitrA)  = 


np 


(np  —  1)! 


exp 


(4 


tryl)  I- 

P 


np— 1 


(trA)"P-‘-d(trA) 

P 
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or 

We  get  another  identity  by  looking  at  theorem  88,  since 

fOO 

£{ivA}=  I  (tr  A)dF(trA)  =  n{tr  X^Ip)  =  X^np 
Jo 

Then 

r(tTAr^exp{-^iTA)d{tTA)  = 

Jo 

for  A  CWp(n,  X^Ip).  This  same  result  is  more  straight-forwardly  evaluated 
using  the  definition  of  the  gamma  function,  letting  x  =  tr  /I. 


Sphericity  Test  Statistic  Density  Function 


We  would  like  to  find  the  density  function  for 

det  A  -A 

»=  ri  ■  .rp  =  E-  =  ny> 

[ltrA\  i=iV  ,=i 

p 

We  know  yi  =  p,  and  the  joint  density  of  (j/i,  •  •  • ,  j/p-i)  is 


(6.15) 


«=i 


dF{yi,---,yp-i)  =  C\ 


nm”'” 


t=i 


L«<i 


We  need  to  do  a  change  of  variables  u  =  fl  and  2,  =  Pi  for  2  <  i  <  p  —  1, 


i=l 


and  then  integrate  out  the  2j.  The  challenge  is  to  handle  the  nonlinearity 


p-i 


introduced  by  Pp  =  p  —  53  Px  when  evaluating  both  the  (p,  —  pj)^  terms  and 


i=l 


the  Jacobian.  The  issue  arises  in  evaluating  pi.  We  compute  the  inverse 


mappings  now. 


p,  =  2i,  2  <  i  <  p  -  1 
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yp=p-y2-y3 - yp-\  -  yi 

u  =  ym  ■  ■  ■  yp-iyp  =  ym---yp-i{p-  yi  -ys - yp-\  ~y\) 

=  y\y2 ■  ■  ■  yp-\{p -y2-y3 - yp-i)  -  y\y2 ■  ■  ■  yp-\ 

p-t 

Let  V  =  1/2^3  •  •  •  yp-i  and  tv  =  y,  as  a  shorthand  notation.  Then  u  = 

i=2 

t/iv(p  —  w)  —  y\v.  Note  that  v  =  22^3  •  •  •  ^p-i  and  it;  =  22  +  23  +  •  •  •  + 

Solve  for  yi  in  terms  of  the  new  variables  using  completion  of  squares. 

y\v  -  yiv{p  ~w)  =  -u 

y\  -  2j/4(P  -  lu)  +  j(p  -  ii;)^  =  --  +  j(p  -  wf 
l  4  y  4 

(1/1  -  i(P  -  =  jiP  -w)^  ~- 

i  41; 

1/  \  1  [1/  \2 

yi  -  -(p  -  ly)  =  ±  I -(p  -  tv)  -  - 

1 ,  ^  1  r^/  \2  “1* 

yi  =  -(p-  ly)  ±  ~{p~w)  -  - 
z  L4  yj 

We  have  two  different  values  of  yi  that,  together  with  (y2,  •  •  •  ,yp-i),  map 
into  the  same  value  of  (it,  22,  •  •  •  Zp-\ ).  Let  dF{u,  22,  •  •  •  2p_i )  be  the  joint  prob¬ 
ability  density  function  of  the  transformed  variables.  It  will  be  the  sum  of  two 
functions  representing  the  transformations  from  sets  Ai  and  A2  into  the  set 
B,  where  set  Ai  corresponds  to  all  values  of  yi  obtained  with  the  (+)  solution 
and  A2  corresponds  to  all  the  values  of  yi  obtained  with  the  (  — )  solution.  To 
write  these  functions,  we  need  to  evaluate  the  Jacobian  belonging  to  each  Ai. 
Let 


=  dF{yu--- ,yp-i) 
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to  make  the  following  discussion  unambiguous.  Let  5,  be  the  inverse  mappings 
from  B  to  A,.  Let  J,(y  ^  denote  the  Jacobian  that  transforms 

variables  (j/i,  •  •  • ,  J/p-i)  in  A,  into  (w,  ^2,  •  •  •  2p_i )  in  B.  Then 

dF(u,  ^2,  •  •  •  2p-i)  def  dF{Z)  =  ^[gx{Y)]  |J,(y"  ^  Z)|  +  ^y)]  IMY  -  Z)\ 

The  Jacobian  will  be  tedious,  but  straight  forward,  to  evaluate  because 
(b  j)  —  2-  The  Jacobian  will  thus  have  the  form 

dyi  dux  .  .  . 

c>u  du  du 

dyi  Ovi  .  .  .  ^Vp-l 

dz2  dzj  0z2 

1J,(V' — ♦  Z)|  =  det  iki  ^  ... 

dzj  dz3  dz3 

dy,  dy2  __  9Vp-i 

dzp-i  9zp_i  9zp_i 

0  •••  0 

t,  1  » 

^  0  I 

OZp^l 

The  other  terms  drop  out  of  the  expansion  of  the  above  determinant  down  the 
first  column  because  the  cofactor  matrices  of  all  contain  a  first  row  of  all 

dzj 

zeros.  Therefore,  the  determinant  of  those  cofactor  matrices  evaluate  to  zero. 
Therefore,  we  do  not  have  to  evaluate  the  messy  terms  Now,  evaluate 

^  for  Ai  and  A2. 
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In  Ai,  we  use  the  (-|-)  solution  for  ^i- 


O' 


dyi  9  .  . 

*7  =  5;  2*''-''’'  + 


\2  “ 
-{p-wY  -  - 
4  V 


1 


1 


-(»  —  wY  — 

2  L4'^  ’  VI 


=  —{{vp  —  vwY  —  4tu’}“2 


-1 


u 


-(p-n,Y-- 
L4  V 


=  -{[~2---2p-l(p-  -2 - Cp_i)]^  -4u22----p-i}  " 


=  ~{tYY  —  4uu}  2 


for  /  =  p  —  tf . 

In  A2,  we  use  the  (  — )  solution  for  pi- 


dyi 

du 


-(p-te)- 


j(p -<»)"-- 

4  V 


=  {(up  —  vtvY  —  4uu}  5 


2p-l(p-  ~2  -  • 


p-i)]^  —  4«:r2  •  • 


=  — 4wu}  2 


for  <  =  p  —  w. 


9yi 

—  2ia 

du 

-f  du 

Now  we  know  we  can  simplify  our  expression 


for  dF{Z). 


dF{Z)  =  <p[9,iY)]  \MY  ^  Z)\+<p[g2{Y)]  W  ->  Z)\ 
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=  M^i(n]  +  *(F)]}|J(F^Z)| 


where 


\J\  =  {[Z2  -  ■  •  Zp.i{p  -  Z2 - 2p_i)]^-4u22---2p-l}  ^ 


Concentrate  on  evaluating  ip[gi{Y)\.  The  challenge  is  to  find  a  convenient 

p 

expression  for  H  (j/i  ~  ViY-  For  2  <  i  <  j  <  p,  the  easy  terms  to  evaluate,  we 

*<j 

get  {zi  —  Zj)^.  Consider  (pi  —  pp)^  for  2  <  i  <  p.  This  is 

iyi-p  +  yi  +  y2  +  ‘---h  yp-\f  =  {y\-p  +  +  wf 


p  +  UJ  +  2j 


where  w  =  Z2-I-  Z3 -I - 1-  Zp-i  and  v  =  22^3 ' ' '  Let  i  =  p  —  w  to  simplify 

slightly  to  get 

f  1  n  ..1  i 

=  (j/.  -  ypf 

This  is  as  simplified  as  I  have  been  able  to  get.  Now  consider  (j/i  —  pj)^  for 
2  <  j  <  p  -  1. 


1  1 

+Zi\ 

1 

1  2 

[4  v\  j 

( 1 

T  ,  u' 

s  1 

(2'+ 

7^  -- 
L4  vj 

Finally,  we  evaluate  (pi  —  pp)^.  Here,  we  have 


(yi  -  yp^  = 


—  {j/i  —  p  +  j/i  + 1/2  +  •  •  •  +  yp-1 


=  {2si  -  p+uip  =  {2s,  -  <)*  “  r  ■*■[*””  V  ~  *} 
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Putting  this  together,  we  find 


where  t— p  —  w  =  p  —  Z2  —  z^  —  —  Zp-\  and  v  =  Z2Zz  ■  ■  ■  Zp-i.  The  term 

Yl  y,""'’  =  is  true  for  both  the  “plus”  solution  density  gi  and  the  “minus” 

»=i 

solution  density  g2.  So, 

<p[gx{Y)]  = 


Now,  evaluate  <^[^2(5^)],  which  corresponds  to  the  (— )  solution  for  y\.  We 

evaluate  ft  iVi-Vif  once  more.  For  2  <  i<j<  p,  we  get  {yi-yjf  =  {zi-Zjf 
><j 

as  before.  Consider  (y,  —  ypY  for  2  <  i  <  p.  Then 


This  differs  from  the  previous  evaluation  by  the  sign  change  for  the  coefficient 

r  ,  i 

of  ~  vj  ^  consider  (yi  —  yjY  for  2  <  j  <  p  —  1.  This  is 


(j.  -  y,?  =  =  «  -  ('=  - 


4u 

V 


Finally, 
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Thus 


The  probability  density  function  for  the  new  variables  (u,  ^2)  • '  ‘  ?  H-\)  is 

dF(u, 22, ‘  , 2p-i)  =  dF{Z)  =  Ciu’‘~^{hi-{-h2}[vV-4uv]~i{d{u,Z2,-  ■  ■  ,Zp-l)) 

(6.16) 

where 

_  (np  —  1)!  7rP(p~i) 
pnp-i  Cr,{n)CTr,{p) 

To  find  the  density  of  u,  integrate  out  (22,  *  •  • ,  ^p-i)-  The  limits  of  integration 
are 

(0,  Zfc_i)  for  Zfc  where  3  <  k  <  p  —  1 
(0, 00)  for  22 

The  case  of  p  =  1  is  trivial.  The  case  for  p  =  2  is  tractable,  and  a  closed 
form  solution  is  presented  for  Fu{x)  =  Pr(u  <  x).  The  case  for  p  =  3  is  tedious 
and  should  be  evaluated  by  some  automated  means.  When  p  >  3,  the  general 
approach  above  is  the  appropriate  tactic.  The  required  integration  is  tedious 
for  the  general  case.  Let  us  look  at  some  small  values  of  p. 

There  is  no  point  in  doing  an  evaluation  for  p  =  1.  When  p  =  1,  then 
«  =  1  since  /J  =  p.  Thus  Pr(u  =  1)  =  1. 

We  are  still  dealing  with  a  special  case  when  p  =  2.  Let  us  go  back  to 
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basics. 

=  l^-cr,(nycf7(2-)--  B"?  +  '^^>1 

Let  T]  =  |(/i  +  /j).  Note:  /j  =  2^  —  /J .  Then  the  joint  density  of  (/j,  77)  is 
dFill,T))  =  Cexp 

where 

- 

A4«cr2(n)cr2(2) 

We  note  that 

cr^in)  =  7rr(n)r(n  ~  1)  =  r(n  -  l)!(n  -  2)! 
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(2n  -  1)! 


yr\‘^-y^T-\yx-lfdyr 


n-2i 


22n-3(„  _  _  2)! 

Since  yi  and  7/  are  independent,  we  see  that 

,  ,  dFjyuv)  ^  <=’‘P  j-jt)  -  yi|"~^(i<i  -  1)^ 


(2n  -  1)! 

Our  assurance  this  is  correct  is  that  it  integrates  to  1. 

Now  to  find  u  =  yi(2  —  yi)  for  the  case  of  p  =  2.  We  see  that  u  =  2yi  —  yj 
implies 

y?  -  2pi  +  1  =  1  -  u  =  (pi  -  If 


Solving  for  yi,  we  find  yi  =  1  ±  v^l  —  u.  We  let  Ax  be  the  set  of  all 

Ki  =  1  +  (1  —  u)5,  and  let  A2  be  the  set  of  all  yi  =  1  —  (1  —  «)K  In  Ax, 
the  critical  computation  in  the  Jacobian  is  ^  =  — 1(1  —  and  in  A2  it  is 
^  =  i(l  —  «)“2.  Note  that  ^  =  |^j  =  \J\  is  the  absolute  value  of  the 
determinant  of  the  Jacobian  matrix.  Thus,  for  the  special  case  of  p  =  2, 

X  |[l +(1 -u)i]"  ’[l-(l-u)^]”  “[(I-u)!]% 

[1  -  (1  -  [1  +  (1  -  |-(1  -  u)^f }  |i(l  -  u)-i 

=  (23n-3-(!r!T)T^-T)|)  (1  -  '‘)M>  +  (1  -  -  <1  - 


du 


(6.17) 
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The  cumulative  distribution  F„(x)  =  Pr{u  <  x}  is  found  by  integrating 
dF{u)  over  the  interval  [0,  x],  where  x  6  [0, 1].  The  integral  is  evaluated  by 
successive  application  of  the  chain  rule.  Note  that  dF{u),  and  hence  F„(x), 
is  independent  of  A^.  We  truly  are  testing  sphericity  without  regard  to  multi¬ 
variate  diameter. 

From  our  combinatoric  identity,  given  in  proposition  103  with  m  =  n  —  2 
and  a  =  we  see  that 


FM  =  fdFM  =  i’u-Hi  -  u)Uu 


3  X  22(n-i)(n  -  l)!(n  -  2)!  (<=+§)  ^  ’ 

(2n  -  1)! 


F„(x)  = 


{n  -  2)! 


3x22("'i)(n-l)!(n-2)! 


(6.18) 


(n-l)(n-i) 


n-2  ("-2) 

*=0  i  kV 


for  p  =  2  and  finite  n.  When  n  =  3,  Pr{u  <  =0.19. 

The  number  of  terms  increases  explosively  with  the  dimension  of  the  ran¬ 
dom  vector.  Even  though  the  case  of  p  =  3  is  still  a  simplified  special  case, 
the  number  of  terms  is  unmanageable  using  manual  methods.  In  this  case  we 
want  to  find  dF(u,  z^)  and  then 


dfiu)=  I  dF{u,Z2) 

Jz2 
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Reviewing  some  notation,  v  =  Z2  and  w  =  Z2.  Evaluate 

n(yi  -  Vi?  =  (yi  -  y2f(yi  -  y3)\y2  -  y^f 

i<i 

where  j/a  =  3  -  yi  -  j/a,  2/2  =  22,  and 


Note  that  yi  —  ys  =  2yi  +  22  —  8  and  j/2  —  J/s  =  J/i  +  2^2  —  3.  There  is  both  a 

(+)  and  a  (— )  solution.  The  (+)  solution  is  given  by 

-3  r  *  1 ^ 

=  JliVi  ~  yj)+ =  2^^  -  ^2)  +  -  ^2)^  -  ^  -22 

X  (3  —  22)  + 

r  - 

X  ^(3  -  ^,)  +  (i(3  -  +  2^,  -  3 

The  (— )  solution  is  given  by 

h2  =  p^iyi  -  yj)l  =  “  ■^2)  ~  -22 

X  (3-22)- 

1  -]2 

X  ^(3  -  2^2)  -  (^(3  -  22)^  -  — )  +  222  -3 

Then 


dF{u,  22)  =  Ciu"'^  {hi  +  A2}  [2|(3  -  22)^  -  4u22]  ^d{u,  22) 
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where 

(3n  -  1)!  TT®  _  {3n  - 1)! 

^  “  33n-i  cr3(n)cr3(3)  “  2  X  33n-i(«  -  3)!(n  -  2)!(n  -  1)! 

To  get  the  density  of  the  statistic  u,  integrate  over  22. 

Although  a  symbolic  mathematics  processor  could  evaluate  the  required 
integral  over  for  small  dimensions  in  a  reasonable  time,  I  think  the  numeri¬ 
cal  accuracy  resulting  from  its  evaluation  would  be  worse  than  obtained  by 
beginning  with  numerical  integration. 


6.3  Tests  Motivated  by  Krishnaiah 

In  this  section  I  provide  joint  distributions  of  some  desirable  test  statistics 
and  associated  nuisance  variables,  when  the  sample  eigenvalues  obey  a  special 
case  distribution.  The  distributions  represent  the  nearest  approach  in  this 
thesis  to  solving  the  original  thesis  question.  These  tests  were  motivated  by 
Krishnaiah’s  works.  Derivations  are  independent  of  Krishnaiah ’s  work.. 

Krishnaiah  has  been  a  central  figure  in  the  development  of  tests  on  eigen¬ 
values,  including  those  making  use  of  concepts  from  James’  work  on  zonal 
polynomials  and  complex  variables.  His  work  is  reported  primarily  in  reports 
for  the  United  States  Air  Force  Aerospace  Research  Laboratories,  and  may 
be  obtained  through  the  United  States  Department  of  Commerce  National 
Technical  Information  Service.  The  ordering  information  (AD  numbers)  are 
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included  in  the  bibliography  of  this  thesis.  These  reports  may  be  character¬ 
ized  by  their  insight,  briefness,  and  use  of  a  lemma  for  integration  which  I 
have  not  yet  tried  to  prove  for  the  context  of  this  thesis.  Krishnaiah’s  works 
are  reported  in  integral  form. 

The  material  that  follows  was  directly  motivated  by  the  problems  which 
Krishnaiah  solved.  I  have  not  worked  out  all  the  details  of  Krishnaiah’s  work, 
so  I  have  not  yet  made  the  necessary  connections  between  his  work  and  the 
work  which  follows.  That  is  an  important  effort  in  the  context  of  order  esti¬ 
mation  to  be  pursued  later. 

In  all  the  work  to  follow,  let  the  sample  eigenvalues  D  ~  diag(/J,  •  •  • ,  /p) 
estimate  the  population  eigenvalues  =  diag(Ai,  •  •  • ,  A^).  I  will  assume  the 
following  special  case  that  the  sample  eigenvalues  have  the  joint  density  func¬ 
tion  of  D  given  by 


IdetZ?!"-'’  7rP(P-i) 


|detZ)r-'’7rP(P-i) 

[detA2]”Crp(n)Crp(p) 


expj-A-^iA]  {dD) 


Idet  Z)r“P  7rP(P-i)  [JL 

^  [det  A2]”  crp(n)Crp(p)  (dD)  (6.19) 

This  is  the  case  when  D  ~  CVkp(n,  A^)  such  that  D  is  further  restricted  to  be 


diagonal.  Thus,  the  elements  of  D  are  independently  distributed  x^- 


This  originally  was  considered  as  a  result  of  the  observation  that  zonal 
polynomials  have  the  property  that  Zm{V^XV)  =  Zm{X)  for  all  U  £  U(n). 
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This  leads  to  saying  Zm{A)  =  Zm(D)  and  Further, 

noting  that  oFo(— A)  =  etr(E~M)  and 


rf=0  *  |m|=£f 

and  substituting  this  into  the  density  function 


Z^{I) 


dF{D)  = 


|det  7rP(P-') 


[detA2]"Crp(n)Crp(p)J 


f'l  ■£  '  " 

<f=0  |m|=:d 


IdetiPr-" 


2m(/) 


•<J 


[[detA2]"crp(n)Crp(p)J 

|detZ)r-'’7rP(P-i) 


jdetA2]’*Crp(n)Crp(p)J 

led  me  to  consider 


dF{D)  = 


[oP.(-S-,/l)] 

[etr(-S-M)] 

(detT>r-'’7rP(p-^) 


(dD) 

n(/?  - (d/)) 


}<J 

p 


II('?  - '?)’ 

?<j 


idD) 


[[detA2]"Crp(r7)Crp(p)J 

n(^?  -  ^]) 


_  Zm(-A-^)Zrr,(D)  '  ' 

L<i=0  d-  |m|=d  ^m(0 


•<J 


(dD) 


|det  nr'” 


|etr(— A~^£) 


Xj 


(dD) 


lldetATcr,(n)cr,(p)J 

The  problem  with  this  approach  is  the  implication  that  etr(— E~M)  = 
etr(— A“^Z)),  which  is  not  true. 


6.3.1  Joint  Density  of  Ratio  of  Adjacent  Sample  Eigen¬ 


values 
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Proposition  1  Let  D  =  diag(/j,  ■  •  • ,  /p)  be  sample  eigenvalues  corresponding 
to  h}  =  diag(Aj,  •  •  ■ ,  Ap)  have  the  density  function  given  in  equation  6.19. 
Then  the  joint  density  of 


s+i 


is  given  by 


dFiij)  = 


7rP(p  ^)r(n  +  p^) 
[detA2]”  Crp(n)Crp(p) 


4  A  + 


-1  -(n+p2) 


A2 

fc=p-l 


X 


p-i 


n 


-l+2t 


«=1 


P-1 


•<J 


(dz/) 


where  /\  (1  +7*;)  is  a  nested  sum.  This  was  motivated  by  the  suggested  trans- 

k^a 

formations  used  by  Krishnaiah  and  Waikar  [Iff]  related  to  their  equation  f.3. 


Proof.  Starting  with  equation  6.19,  change  variables  from  to 

(^1,  •  •  •  ^p)  where 

F 

Oi  =  /i,i+i  =  -p 

h+i 

1  <  e  <  p  —  1  and  Op  =  /^.  The  transformation  matrix  from  (/i,---,/p)  to 
(di,  •  •  •  Op)  is  given  by 


/ 


T  = 


\ 


*2 


7^ 

*3 


‘I 


1 
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The  Jacobian,  in  terms  of  If,  is  given  by  |det  T  '  1  =  |^  det  D|  .  In  terms  of  0,, 
we  find 


=  il  =  ipx  =  u^i(0) 

*2 

02  =  §  q  =  1102  =  0pOp-l  ■  ■  ■  0302  =  U>2(0) 

*3 


0p-i  —  ^p-i  —  q0p-i  —  0p0p-i 

Ip 

«,=p, 

In  this  form,  the  Jacobian  is  given  by 


- 

■ 

3uji 

dw\ 

9wi 

901 

302 

90p 

dw2. 

Siaa. 

9w2 

d$i 

902 

dOp 

det 

; 

i 

9«'p-i 

dwp~i 

dwp-i 

90i 

302 

90p 

dwp 

dwp 

9rvp 

901 

302 

90p 

0203  -  0p 

0\03  •  •  ■0p  •  •  • 

0x 

•  •  •  0p-i 

0 

03'  "0p 

02 

•  •  ■  0p~i 

det 

0 

: 

: 

; 

0p-\ 

0 

0 

1 

=  {02 

■■■0p){03-- 

■0p){e,"'0p)' 

-xOp)0p  ■ 

1 

1=2 
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The  joint  probability  density  function  of  0  =  (0i,  ■■■  ,0p)  is  given  by 


dF{Q)  = 


7rP(p-i) 


[det  A2]"Crp(n)Crp(p) 


/  p  1  p 

V  j=‘  > 


xin 


,«=i  Lj=‘ 


n«r 


n 

I  •<j 


Kk=i  )  \i=j 


nr‘} 

t=2  J 


(d0) 


Note  that 


‘4  n  ‘■•0p+  +  •  •  •  +  -r^0p-i9p  +  — ^p 

j=i  \  j=i  ''a  ''p-i 


=  ^p[^  +  +  •  •  •  +  ^2[^  3^^  ‘  ■ 


=  +  #  Vi[i  +  .  .  .]]] 


'^p-2 


A? 


A? 


''p  k=p-l 


Therefore 


dF(0)  = 


;rP(P-l) 


[detA2]"Crp(n)Crp(p) 


expf-^^P  A 

\  ''p  *=p-i 


,  ,  Afc+ifl 


X {f[«r”‘} {n/r'l |n(9,«,+,  •«,f(e,A+.  ■•«,-i  - 1)"| (dQ) 


To  obtain  the  joint  density  of 


(US.  S=i\^(0  0 

vV^V  ’ 


) 


we  integrate  dF(0)  on  Op  €  (0,  cx>).  We  temporarily  simplify  the  notation  to 
help  identify  the  integration  problem.  Let 

7rP(p-*) 


C  = 


[det  A2]"  crp(n)crp(p) 
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^  =  i  A 

k=p-\ 


1  + 


-'fc+l  a 


93  — 


9i  =  "n 

i=i 

p-i  p— 1 

52  =  n  er^  =  n 

«=2  1=1 

P-1 

9x92=  n 

i=l 


nWi+i 


t=i 


The  idea  is  for  these  to  be  cofactors  of  0p.  The  justification  of  is  not  obvious, 
so  we  give  a  little  more  detail.  Let 

*(0) = n  n  - 1)' 

t=l  ;=«+l 


P-2 


=  -  1)^  n 


«=i  b=«+i 


■=i  I  U=i+i 

=«j(v.-i)“ni n  m+.  ■  v 

j=i  I  [  j=i+i 


J=l+1 


) 


n  (M.+i  -If 

j=.+i 


1 


At  this  point,  we  need  another  observation.  Note  that 

EiP  -  ■)  =  p{p  -  2)  -  £  i  -  p(p  -  2)  -  ~  ~ 


«=1 


i=l 


=  (p-2)( 


2p  - p -H \  _  (p+  l)(p-2) 

2  /  2 


Now  we  see  that 


15(0)  =  52(5p_,  -  i)20(p+i)(p-2) 


»:( 


p-i 


n  («A+. 


j=i+l 


n  -  -If 


J=i+l 
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The  following  minor  bookkeeping  will  help  us.  2+(p+l)(p— 2)  =  2+p*— p— 2 
p(p  —  1).  We  get 

<lr(0)  =  -  1)2 


p-2  r 


p-i 


n  -Oj-. -If 

j=j+i 


1 


—  ^p(p-i) 


p-i 


1=1 


p-i 


«<j 


Using  c,^,pi,p2,flf3,  we  rewrite  dF(0)  as 


(iF(0)  = 

=  ^2’  •  •  •  >  ^p) 

We  now  integrate,  using  lemma  62. 

/*oo 

/  dF(0)  =  Cp,P253<i(^i,^2,--*,Vi)  / 

./ep=o  -/o 

At  this  point,  some  more  bookkeeping  helps  us.  n  -|-  (p  —  l)(p  -t-  1)  +  1 
n  +  p2  —  1  -t- 1  =  n  +  p2.  Thus 

r  rfF(0)  =  Cp,p2P30^(^i,^2,---, +  p') 

Jep=o 


To  simplify  notation,  let  =  (0i ,  02,  •  •  • ,  0p-i )•  Then 


dF{v)  = 


7rP(p  ‘)r(n  +  p2) 
[detA2]"  Crp(n)Crp(p'i 


) 


-(n+P^) 


P-1 

JJ  ^n+P-l+2» 
•=1 


n  ■  •  •  Vi)’  (e.9.+i  ■  •  ■  »,-T  - 1)"} 

i<J 


(di/) 
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Substituting  =  9i,  but  not  doing  the  change  of  variables,  we  can  compute 
*•+1 

this  as 


dF{u)  = 


7rP(p-i)r(n  +  p^) 

[detA2]”Crp(n)Crp(p“) 

n+p-l+2i 


^  A  + 

\k=p-l\  ‘fe+l/J 


1  -(n+P*) 


P 

V-l  /  72  \  n+p-l+2t1  Tp-l  / 12  \  20-1)1  fp-l  (  (  12  \\ 

a(s)  I1e(?)  llB{(r'))| 


{dv) 


We  need  a  few  more  notes. 


P_1  /  J2  \n+p-l+2i  ^2(n+p+l)  ^2(n+p+3)  ^2(n+3p-2)  ^2(n+p+l)^4^4 

n  ~  ^2(n+p+l)  ^2(n+p+3)  ^2(n+3p-2)  ^2(n+3p-2) 


p-1  //2\2  p-1  //2\20-1) 


^1+1  ^^2 

where 


p-i 


n 


14  P-1/12\2  P-1  //2 

=  ^  implies  n(]i)  =n(]i 

‘p  i<j  \‘p/  j=2  \‘p 


1  p~* 

^  TT 

.2(p2-p-4)  11  0 
‘P  J=2 


j:2y  -  1)  =2[-l  -  1  +I;j1  =2|-2+  =  -4  -p 

i=2  j=i 


and 


_!L/k  = 

'?+.'?«■■■  '?  'I 

Just  a  little  more  bookkeeping,  and  we  see 

^2(n+3p-2)  ;2(p*-p-4)  *2  3  4  pi  ^2(n+p*+2p-6) 


Putting  it  all  together,  the  joint  probability  density  function  for  =  (^i,  •  •  ■ ,  ^p_i ) 

^2 

written  in  terms  of  0,  =  71^  is 

‘•+1 


nPiP-^^rjn  +  p^)  [1  '  / 

-  [det  A2]"  Crp(n)Crp(p)  A2 


1  + 


AJiijfL' 

Afc  ^fc+i , 


-I  -(n+p^) 
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j2{n+p-l) 

y2(n+3p-6) 


■p-l  1 

f  12  \  2A:'l 

P-l  f  /  72  > 

n 

_*=i ' 

n  1-1 

t<j  t  \‘j  j 

)}] 

(du) 


Note  that  when  all  the  {A^}  are  equal,  then  the  second  factor  is 


A2(n+P") 

1 

A 

fc=p— 1 

\  4+1/ 

Note  that 


1 

A 

fc=p— 1 


1 

(‘  nJ  - 

1 + 

+ 

1  1 

+ 

f 

+ 

1 _ 1 

II 

^  - 

- - - - -  V" 

i2  {2 

i+^4+7i 

1-  ‘a  *4  *4 

1  ri  1  P-l 

=  1+  E  l  =  i  +  ^E« 

‘p  jfe=i 


P 

fc=P+l  V 


So,  the  second  factor  collapses  to 


j^2(n+p*) 


1  P-i 

>  +  ^E'2 

‘p  k=i 


-(n+P^) 


_  ;^2(n+p*) 


l  +  i(-(?  +  tr(C)) 


-(n+P*) 


-(n+P^) 


=  ;^2(n+P^)[||trDj  ^  (A2/2)'>+p"  [tr 


So,  under  the  null  hypothesis  that  A^  =  Aj  =  A^  =  ■  •  •  =  Ap,  we  get  the  density 
function  of  t/  under  the  null  hypothesis  as 

dF(u  I  A^  =  A^l  =  ^'’^'’~*^r(n  +  p^)  f 

^  ‘  ^  [detATCrp(n)Crp(p)  l,trZ7j 


j2(n+p+l) 
^2(n+3p— 6) 


n© 


2fcl 


fp-i  fn  ' 

n 

*<j  \  j  t 


{dv) 
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Krishnaiah  and  Waiker  [144]  consider  simultaneously  testing 


Hi,i+t  :  A?  =  A?+i  against 

:  A?  >  A?^, 

for  1  <  i  <  p.  Hi^i+i  is  accepted  or  rejected  according  to  the  comparison  of 

73^  to  a  suitably  chosen  critical  value  c,a  where 
‘>+1 

Pr|l<^  <c.v,l<i<p-l  |/f|  =  (l-Q) 


The  total  hypothesis  H  is  accepted  if  and  only  if  all  the  component  hypotheses 
are  accepted.  The  power  of  the  test  is  given  by 


1  -Pr 


( 


l<7^<c.a;l<i<p-l 

s+l 


P-1 

where  A—\J  >li,i+i. 

»=i 

The  joint  density  dF{v)  is  the  appropriate  function  for  computing  the 
required  critical  values  {c,a}.  Notice  that  values  for  the  {A?}  must  be  assumed. 
Krishnaiah  and  Waiker  [144]  provided  the  test  distribution  for  the  case  of  the 
real  variable  Wishart  matrix.  □ 


6.3.2  Joint  Density  of  the  Ratio  of  an  Arbitrary  Sam¬ 
ple  Eigenvalue  to  the  Smallest  Sample  Eigen¬ 
value 


Proposition  2  Let  the  sample  eigenvalues  D  =  diag(/|,  •  •  • , /p)  estimate  the 
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population  eigenvalues  A*  =  diag(Af ,  •  ■  • ,  A*)  and  have  the  joint  density  func¬ 
tion  given  by 


r)l"~P  irP(P-*)  P  72  1  P 


Then  the  joint  density  of  v  —  (0i,  •  •  ■  Oi  =  7^  is  given  by 

*P 

dF(o)  =  +  ^1 

^  ^  [det  A2]”  crp(n)crp(p)  [\yti 

X  n  er^  n  {0,  -  If  n  (^.  -  w 

.t=l  _  _t=l  _  j=t+l 

This  was  motivated  by  the  transformations  suggested  by  Krishnaiah  and  Waikar 
related  to  their  equation  j.1  [IjjJ. 


I- 

Proof.  Change  variables  from  {li,---,ll)  to  {0i,---,0p)  where  0,-  =  A, 
1  <  i  <  p  —  1  and  0p  =  l^.  To  compute  the  Jacobian,  we  note  that  If  =  lf0i  = 
0p0i  =  Wi{Q).  Then 

fas  a  \  (  .  \ 

ow\  ow\  ow\  d  c\  n 

dBi  aflj  "  ’  a^p  (7p  u  •  •  ■  u 

at<;2  dw  ,  ,  ,  dw'2  n  * .  • .  \ 

cta  cta  •  •  •  ^ 


du/\ 

dwi 

dwt 

d6i 

dQ2 

dSp 

dw2_ 

dw2 

dwi 

dBi 

d$2 

d9p 

dwp 

dwp 

dwp 

dBi 

862 

dBp 

=  det 


=  ^r* 


0p  0 


I  1,1?  •••  I?  I  \  0  ...  0  1  ;| 

We  also  need  to  chase  some  messy  subscripts  and  isolate  0p  since  we  want 
to  integrate  on  0p.  We  tackle  the  messiest  one  first. 

f[('?  -  'i )" = n  n  ('?  -  'i)" = ('p-.  -  ly  ff  n  ('?  -  ?)" 

i<j  i=l  j=i+l  »=1 
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»=l  j=i+l 

Now  substitute  in  the  new  variables.  We  get 


n(i?  -  i?)’ = n(«.«,  -  hr  n  va  -  oAf 


t<] 


«=i 


j=f+i 


= -  irfi«uh  -  ir  n  -  ^r 

>=i  i=i+i 

= n(«,  -  i)2«j(f->-o  jf 

i=i  j=i+i 

^^2^2(p-2)^2(p-1)(p-2)^-(p-2)(p-1)  J^(^,  H 

t=l  i=t+l 

We  engage  is  some  bookkeeping.  2  +  2(p  —  2)  +  2(p  —  l)(p  —  2)  —  (p  —  2)(p  —  1) 
=  2(p  -  1)  +  (p  -  l)[2(p  -  2)  -  (p  -  2)]  =  2(p  -  1)  +  (p  -  l)(p  -  2)  = 
(p  —  1)[2  +  p  —  2]  =  p(p  —  1).  Therefore, 

wf  -  nV.  - 1)^  n  («.  - 

i<j  »=1  i=«+i 


Consider  also 

p  p-i  p-i  p-i 

JJ  j2(n-p)  _  Qn-p  JJ  0”-P0^-P  —  ff”~P0^-p)(p-^)  JJ  0^-P  —  0P[n-p)  Qn-p 

i=l  1=1  :=1  «=1 


Within  the  exponential  function, 

P  12  /2  P-1  12  n  P-1  a 

i=l  ''l  \  i=l  ''p  ,=1  -'t 


1 


P-1  p. 

i.  ^  Vt 

"^p  t=l  ''t 


Putting  it  all  together,  with  the  Jacobian,  we  get 

T^PiP-i) 


dF{Q)  = 


[detA^]”  Crp(n)Crp(p) 


exp 


1  l9 

-«p  4+e4 

W  i=t  ^ , 
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p-i 

n 

i=l 


^p(p-l) 


nV.  - 1)=  n  (ft  -  ft)“ 

i=i  j=.+i 


0p^(d0) 


We  need  to  collect  the  powers  of  6p 


p{n  -p)+  p(p  -  1)  +  (p  -  1)  =  p(n  -  l)  +  p+  l  =pn  +  l 


Then 


dF(e)  = 


7rP(p-i) 


[detA2]"Crp(n)Crp(p) 


exp 


-•■(k'ti) 


X 


p-i 

nc 

t=i 


-p 


P-2 


P-1 


n(ft-if  n  (ft -ft)’ 


t=i 


j=i+i 


(d0) 


We  want  to  integrate  out  Op  to  obtain  the  joint  density  of  i/.  Using  lemma  62 


we  see 


roo 

exp{-0Op)dOp  =  +  2) 

where  Re(np  +  2)  >  0  and 


1  0 
\  1=1  ''i 


The  joint  density  of 


(‘I  ‘i  Szi'l-.-w  e  «  1 


is  given  by 


dF{u)  = 


;rP(p-i)r(np  +  2) 

[detA2]"Crp(n)Crp(p) 


P-i  ^  l-<’'P+2) 


1  P“‘  0- 

— f-  — 

\2  '  ^  \2 

1=1 


p-1 

n  c-' 

1=1 


n(ft-i)’  n  (ft -ft)’ 

i=i  j=i+i 


(du) 
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I  want  to  rewrite  this  in  terms  of  /?,  but  not  do  a  change  of  variables.  First, 


some  details. 


l4.vi  2  =  15 =  1^1 

p-l  p-l  /I2\”-P  P-1  , 

n  n  H  =  (r'’-"--'  n 

t=l  «=1  \‘p/  •=! 


p-2  P-2  / 12  \2  p-2 

nw-i)^=n  1-1)  = n  (2  -  5) 

i=X  i=l  \‘p  /  i=l 

p-2  p-l  p-2  p-l  /.2  12 \  2  p-2  ,  v: 

n  n  w-<'/)'=n  n  (i-i)  n  (2-2) 

»=1  j=i+l  t=l  j=i+l  \  p  *p/  1=1  i=i+l 

=  n  /r'--"!?  ,n  (2  -  2)’ = ir'"-”''"'  n  2‘  ,n  (2  -  2)' 


t=i  i=i+i  «=i  j=«+i 

p-l  p-l 

~  /-4(p-l)(p-2)^2(p-2)(p-l)  JJ^^2  _  ^2^2  _  ^-2(p-l)(p-2)  _  j2y 

•<)  i<j 

Substituting  back  into  the  joint  density  function,  we  get 


dF(i/)  = 


xp(p-i)r(np  +  2)  r  p  /? 
[detA2]"Crp(n)Crp(p)  [^A? 


'  p  ;2  1  -(’•P+2) 

^2(np+2)^-2(p-l)(n-p) 


X  nV?- 221 n(2-22  w 

1=1  _  _i=l  _  ,«<i 

Now  to  collect  the  powers  of 


np  +  2  -  (p  -  l)(n  -  p)  -  2(p  -  2)  -  (p  -  l)(p  -  2) 
=  np  +  2  -  (p  -  l)(n  -  2)  -  2(p  -  2) 

=  np  +  2  —  np  +  n  +  2p  —  2  —  2p-t-4 


=  n  -|-  4 


The  joint  density  of 
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TP('’-')r(np  +  2)/2(»+4) 

[detATCr,(n)Cr^(;))  [fe  A?. 

X  f^)"”  fnV?  -  ^)‘l  fnV?  - '?)']  (''■') 

\  ‘p  /  Li=i  Lt<j 


We  can  do  just  a  little  more  collapsing  of  terms. 


''  ldetArcr,(n)Cr,(p) 


-(np+2) 


r  ifi-p 


When  we  select  the  null  hypothesis  Hi^p  :  XJ  =  for  all  i  <  p,  which  is 
A2  =  AJ  =  A^  =  •  •  •  =  A^,  we  get 

,F(.  I  A?  =  for  oil  0  = 

xg^[n«?-?)^]w 

The  alternate  hypothesis  is  Ai,p  :  Xf  >  X^  for  all  i  <  p. 

We  follow  Krishnaiah  and  Waikar  [144]  in  constructing  the  test.  We  test  all 

{Hi^p}  against  all  alternatives  {r4i,p}.  We  accept  or  reject  /f,,p  for  1  <  i  <  p  —  1 

according  to  the  comparison  of  the  test  statistic  k  to  the  critical  value  C,o 

V 

where 

Pr{l  <  I  <  Cva,  1  <  i  <  P  -  1  1  ^}  =  (1  -  a) 

The  total  hypothesis  H  is  accepted  if  each  individual  hypothesis  Hi^p  is  ac¬ 
cepted.  The  power  of  the  test  is 


1 -Pr{l  <  I<i<p-1M} 

^P 
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p-i 

where  A  =  (J  Ai,p. 

i=l 

The  joint  density  of  dF(i/)  is  the  appropriate  function  for  computing  the 
required  critical  values  {C,a}.  Notice  that  Xf  must  be  assumed.  Krishnaiah  and 
Waikar  provide  the  test  distribution  for  the  case  of  the  real  variable  Wishart 
matrix.  □ 


6.3.3  Joint  Density  of  Ratio  of  Sample  Eigenvalues  to 
Largest  Sample  Eigenvalue 

Proposition  3  Let  the  sample  eigenvalues  D  =  diag(/i,  •  •  •  ,lp)  estimate  the 
population  eigenvalues  =  diag(Aj,  •  •  • ,  Ap)  and  have  the  joint  density  func¬ 
tion  given  by 


dF{D)  = 


|det 


[[detA2]"Crp(n)Crp(p) 

Then  the  joint  density  of 


exp 


P  ]2 

E-Jl 
A2  , 

L  k=l 


n('?  -  ‘jy 

i<J 


(dD) 


is  given  by 


jp,  . _ '>r(np)  [1 

“  [det  A^r  cr,(n)cr,(p)  U?  hx?. 


11(1- 
L)=2 


n (<!■') 

■i<j  J 

This  was  motivated  by  the  transjormalions  suggested  by  Krishnaiah  and  Waikar 
[US]. 
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Proof.  Change  variables  from  (/i, to  where 

2  <  i  <  p  and  9i  =l\.  To  compute  the  Jacobian,  we  note  that 

/2  =  l\ei  =  OrOi  =  n;,(0) 

Then 


We  do  some  subscript  chasing  to  prepare  for  cleanly  expressing  the  joint  density 
of  the  new  variables  0,  and  to  ease  the  integration  over  all  O-y. 

n('?  -  'j)’ = n  n  ci  -  =  n('?  -  p,)'  n  n  ('?  - 

i<j  1=1  j=j+l  j=2  _  i=2i=i+l 

Now  substitute  the  new  variables.  We  get 

j=2  J  i=2j=«+l  l_j=2  J  »=2j=i+l 

n ( 1  -  )'  n li  fi 

,j=2  _  _»=2  _  _i=2  j=«+l 

n(i-«,)=  n«;-  n'o,-"  nnw-«i)“ 

_j=2  _  _«=2  _  1=2  _  _i=2  j=i+l 

^  ^2(p-l)^2p(p-2)^-(p-l)p+2  ]^(1_0.)2 

Lj=2  [t=2j=t+l 

= n(i  -  n  n  («-  -  h? 

j=2  t=2  j=i+l 
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For  completeness  and  consistency  of  mistakes,  we  look  at 


11(1 

.i=2 


n 

2=i<j 


jj  /2(n-p)  ^  ^2(n-p)  jj  ^2(„-p)  ^  ^n-p  jj  ^n-p^n 

j=l  t=2  t=2 

=  e^-p0(^-p)(p-^)  n 

,=1  \t=2  L^l  t=2  ''t  J 

Putting  this  all  together  with  the  Jacobian  gives  us 


jE’/ia\  _  *' _ i _ \  ‘  t=2  '  /  -I  flp(w-p) 

[detA’l"  cr,(n)crp(p)  ‘ 


n«r' 

U=2 


X 

We  do  some  more  bookkeeping  to  gather  the  0i  terms. 


.J=2 


n  («^  -  «i)’ 

2=«<i 


p(n  -  p)  +  p(p  -  1)  +  (p  -  1)  =  p(n  -  l)+p-l  =np-l 


Thus 


dF{e)  = 


^p^p-l)Qp{p-i) 


expp.(i  +  E|); 


[detA>|“  cr,(n)cr,().) 

)’J  (dS) 

To  get  the  joint  density  of  a  test  statistic,  we  integrate  out  0i.  Using  lemma 


n-’r 

>=2 


11(1 -^i)' 
U=2 


r  =  ^"'’r(np) 

Jo 


62, 


where  Re(np)  >  0  and 
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I  6 

''l  i=2 

Using  this  result,  the  joint  density  function  of 

is  given  by 


dF{u)  = 


7rP(P~*)r(np) 


[detA2]"  crp(n)crp(p) 


—  +  y — 


np 


11(1 -«i)X 

.i=2 


n-p 


2=i<j 


I  want  to  write  this  in  terms  of  /?,  without  doing  a  change  of  variables. 
This  is  for  obtaining  a  computation  form  in  terms  of  the  original  variables. 

\2  \2  ~  \2  \2  /2  ~  \2  72  72  \2  ~  72  \2 

''l  1=2  ''l  1=2  ‘1  M  M  1=2  ''i  ‘1  i=l 


fid  - «-)“  =  fid  -  i)“  =  n  'r'd?  -  'J)’  =  'r'""’  n('?  - 

j=2  j=2  ‘I 


j=2 

p  //2\»‘-P 


P 

n 

J=2 


A  //?  /? 


_  ;-2(n-p){p-l) 


n'f 


-p) 


>=2 


p  p-1  P  /I2  72\2  p-I  p 

n  (4-1  =n'r'"-"  n ('/-'?> 

=«<j  i=2j=i+l  \M  ‘1/  i=2  j=i+l 


2=i<j 

p-1 

n/r- 

i=2 


P-1 

II'!' 

1=2 


P-1  P 


n  n  ('?  -  n  (<?  - '?)' 

i=2j=i+l  2=t<j 


= n  (/f  _ 

2=i<i 

Putting  it  all  together,  we  get  the  density  function  for 

«  5'! 

fir 


in  terms  of  Ij  as 


dF{u)  = 


;rP(p-i)r(np)/i-^”P 

[detA2]"Crp(n)Crp(p) 


ylL 

i=i  ''t  J 


-4(p-l) 


!!('?  - '?)’ 

J=2 


^  ^~2(n-p)(p-l) 

Collect  the  exponents  of  /f.  We  get 


n 

j=2 


-p^+3p-2) 


n  ('?  - '?)’ 

2-i<j 


{du) 


—np  —  2{p  —  1)  —  (n  —  p){p  —  1)  —  +  3p  —  2 

=  —np  —  2p  +  2  —  np  +  p^-fn  —  p  —  p^  +  3p  —  2 
=  — 2np  +  n  =  n(l  —  2p) 


Consolidating  the  /J  terms  gives  us 


dF{u)  = 


[detA2f  CTp{n)CTp{p) 


I;^?J 


np 


n(/?  - 

,=2 


n  (/?-/')=!  w 

2=i<j 


[detA>|“Cr.(n)Cr,(p)  [ttAfJ 


P  /2  1  "P 
i2 


n 

b=2 


n(/?  - 


(di/) 


We  have  one  more  slight  opportunity  to  economize  on  notation.  H  = 

J=2 

^|detD|^  P  ^  _  2^p  _p^j_  2n).  This  gives  us 


dF{v)  = 


7[-p(p  i)r(np)  |detZ?|" 


-p 


P  /2 

V  — 
\2 

Li=l  J 


np 


IK'?  - '?)’ 

•<j 


(di/) 


[detA2]"Crp(n)Crp(p)  /2p(2"-‘) 

When  we  select  the  null  hypothesis  //i,i  :  Aj  =  A?  for  all  i,  which  is  the 
same  as  A^  =  Aj  =  Aj  =  •  •  •  =  Ap,  we  get 


dF{v  I  Aj  =  A^  for  all  li) 
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7rP(P-i)r(np)  |det  [tr  Z)]”" 
[det  A2]"  Crp(n)Crp(p)  /2p(2„-i);^2„p 

7rP(p-i)r(np)  Idet  D\^-^  [tr  D]"'’ 


n(/?  - 

i<i 


{dv) 


i<j 


(dv) 


crp(n)crp(p)  /JP(2n-i);^4„p 

The  alternate  hypothesis  is  Ai^i  :  <  Aj  for  all  2  <  i  <  p. 

We  test  all  simultaneously  against  all  alternatives  We  accept 

or  reject  Hi^i  for  2  <  i  <  p  according  to  the  comparison  of  the  test  statistic 
■i  against  the  critical  value  Cia  which  is  appropriately  chosen  for  the  desired 
significance  level  a  such  that 


Pr{aa<|<l,2<3<p|//}  =  (l-a) 

The  total  hypothesis  H  is  accepted  if  each  individual  hypothesis  is  ac¬ 
cepted.  The  power  of  the  test  is 


l-P^{C'.a<|<l,2<^<p|^} 

M 

P 

where  A  =  U  -d,  i . 

»=2 

The  joint  density  dF{v)  is  the  appropriate  function  for  computing  the 
required  critical  values  {Cio}.  Notice  that  {A?}  must  be  assumed.  □ 


6.3.4  Joint  Density  of  Ratio  of  Arbitrary  Sample  Eigen¬ 
value  to  Tr**ce  of  Sample  Covariance  Matrix 

Proposition  4  Let  the  sample  eigenvalues  D  =  diag(/?,  •  •  • , /p)  estimate  the 
population  eigenvalues  =  diag(AJ,  •  •  • ,  Ap)  and  have  the  joint  density  June- 
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tion  given  by 


dF{D)  = 


IdetZ)!”-'’  7rP(P-i) 


exp 


P  12 

E-Jl 
\2 

fc=l 


IK'?  - '?)’ 

i<3 


[[detA^]"  crp(n)crp(p)j 

Then  the  joint  density  o/  ^  =  {62,  •  •  • ,  0p),  is  given  by 

^P(p-i)r(np)  r  p  /  1  1  \  1  -"P 


(dD) 


dF($)  = 


L.=2  ''l; 


[det  A2]’‘  crp(n)crp(p)e 

x[(l-d2 - 0p)02---OX-’> 


i[{i-02 - ej.r-2ej-ej+r - 

i=2 


(d$) 

This  was  motivated  by  the  transformations  suggested  by  Krishnaiah  and  Schu- 


n 

2=i<j 


urmann  [151]. 

Proof.  The  results  and  proof  in  [151]  are  for  a  complex  Wishart  matrix  dis¬ 
tributed  as  CVPp  ,  /p) .  Change  variables  from  (/?,•••,  fp)  to  (i/i ,  •  •  • ,  j/p) 

p 

where  1/1  =  j2  ^'iid  Uk  =  ll  (or  2  <  k  <  p. 

«=i 


/  N 

/ 

\ 

/  \ 

Vi 

1  1 

...  1 

l\ 

1/2 

= 

1 

11 

K  J 

1 

\  p  / 

or  N  =  BL.  The  Jacobian  is  det  B~*  =  1. 

To  form  the  density  function  of  N  =  (fc'i,  •  •  •  1  *^p)  more  easily,  let  us  do 
some  bookkeeping  first. 


p  /2 
t=i 


/■,  —  U2  ~  —  Vp 


+E^  =  ‘+-'> 

i=2 
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V 

n  /,■  =  (j/l  -  f/2 - l'p){t^2V3  ■•■Vp) 

1=1 

nt'.-  -  'i)^ = fn('?  - '?)’  n  n  ('.■  -  n? 

t<j  J=2  t=2  j=i+l 

=  1/1  -  1/2 - !/_,_!  -  2i/j  -  i/j+i - I/p,  for  2  <  j  <  p 

l]-p.  =  Vi  -  Vj,  for  I  ^  1 

Substituting  the  new  variables  and  including  the  Jacobian,  we  get 

"  ldetA^i"crp(n)cr,(p)  {“P  f '  'I'"'  (a?  "  a;)  } 

X  [1/1I/2  •  •  •  I/p  -  (l/2  +  •  •  •  + 

’  P 

X  Y[{v\- 1^2 - t'j-i  -  2i/i  -  I/j+i - Vpf 

.i=2 

^  n  n  (*'•  ~  1 

i=2  i=i+i 

The  next  step  is  again  a  change  of  variables.  Let  9\  =  v\  and  let  Ok  = 
for  2  <  fc  <  p.  Then  Vk  =  viOk  =  Wk{Q)-  The  Jacobian  is  found  by 


To  make  the  work  easier,  we  do  some  more  bookkeeping. 


[■’ "  s  "■  ■  jff  )1  “ h  -AAA  y . 


where 


Then 


p[-a5i] 
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I/jl/j  '  •  •  Up  —  {U2  +  •  •  •  +  Up)v2l'z - ’'p  —  ^1^2 - “  ^l(^2  +  •  •  ■  +  ^p)^2  ‘  '  Gp 

=  0P(l-«2 - 0p)02---Op 

Ui-t/2 - l/j-i-2t/j-t/j+i - Up  =  - 0j-i-20j-0j+i - 0p] 

t=2  j=i+l  i=2 j-x+1 

We  have  effectively  isolated  all  terms  of  0i.  Collect  these  terms  with  the  powers 
they  are  raised  to  in  the  density  function.  We  get 

^p(ti-p)  ^2(p-1)^2|  (p-l)(p-2)^p_l 

where  the  last  factor  is  the  Jacobian  of  the  transformation.  Then 


p{n  - p)  +  2{p -  1)  +  {p -  l){p -  2)  +  {p-  1)  =  p(n-p)  +  (p-l)(2  +  p-2  +  l) 


=  pn  —  p^  p^  —  \  =  np  —  I 


Collect  all  the  terms  to  get  the  density  function  of  0  =  (0i,  •  ■  ■  ,0p). 

7rP(p-') 


dF(0)  = 


[det  A2]”Crp(n)Crp(p) 


1(1-02 - 0p)02---0pr 


-V 


jj  (1  _  •  (9,-1  -  20,  -0,^, - ^p)' 

U=2 


P-I  p 


n  IK".  - 

i=2  j=i+l 


X 
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where 


•■Piirii) 


Our  goal  is  achieved  by  integrating  out  6i  to  get  the  marginal  density  of 
^  =  (^2,  1  ^p)-  Using  lemma  62,  we  see  that 

fOO 

/  =  a-^f>r(np)  =  a'""  l(np  -  1)!] 

Jo 


Therefore,  the  density  of  ^  is  given  by 

7rP(P-^)r(np) 


dF{'9)  = 


[detA2]"Crp(n)Crp(p)  ‘VA? 

x[(i-02 - 


X  n  (1  -  02  -  •  •  •  0J-X  -  29j  -  — 9,)^  n  n  -  q,?  w 

J=2  _  _»=2  i=i+l 

We  want  to  know  how  to  compute  this  in  terms  of  our  original  variables 
(/f ,  •  •  • ,  Ip).  We  do  some  more  bookkeeping. 


{1-02 


0p)02---0p 


11 

■  1 

1 

11 

tr  D 

trOJ 

'  trO  trO 

tr  D 

=  (i;3)  ^  =  (s^) ' 


det  D 

[ti^ 


1  _  02 - 0^_,  -  20J  -  0j+i - 0J, 
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trD  H 


p 


tr  D  tr  D  tr  D  tr  D  tr  D  tr  D 

=  -  2'?  - — ‘1] 

('!-'?) 


P  P  1 

g.  _  a  .  —  _ 

‘  ^  tr  D  it  D  ti  D 


We  collect  the  powers  of  (tr  D)  as  a  final  bookkeeping  task.  To  simplify,  let 
X  =  tr  D.  Then  we  have 


x"P(tr  £))«-Px-p("-p)a:-2(p-i) 
Then 


i=2 


^-i2(p-2)(p-l) 


n  n  {'?  - 1^)' 

i=2 j=i+l 


np-p{n-p)-2{p-l)-{p-2){p-l)  =  np-p{n-p)  -  (p  -  1)(2  +  p  -  2) 


=  np  —  p{n  —  p)  —  p(p  —  1 )  =  np  —  np  +  p^  —  p^  +  p  =  p 


This  gives  us 

dF{^)  = 


7rP(p  *)r(np) 


{d^) 


-1  -np 


n  -  /]) 

l=i<j 


[detA2f  Crp(n)Crp(p)e 

X  [trZ?]'’[detZ?]""'’ 

The  idea  to  seek  the  joint  density  of 

fJL...  JL] 

[trD’  'irDj 

was  motivated  by  Krishnaiah  and  Schuurmann’s  suggestion  to  perform  the 
change  of  variables  Uj  =  ^  for  1  <  i  <  p. 
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Under  the  hypothesis 

U  H,.,  :  A?  =  Ai 

t=2 

we  see  that  dF(^)  =  0.  The  alternative  hypothesis  is  given  by 

n  <  A? 

i=2 

» 

□ 


Corollary  4  .  Let  the  sample  eigenvalues  D  =  diag(/j,  ■  ■  • ,  estimate  the 
population  eigenvalues  =  diag(Aj,  •  •  • ,  A^)  be  nonsingular  such  that  Aj  ^  AJ 
for  k  >2.  Let  D  have  the  joint  density  function  given  by 

n('?  - '?)" 

.»■<> 

Let  9i  =  irD  and  9k  =  k  >2.  Let 


dF(D)  = 


|deti)r-'’T'’(P-‘) 


[[detA2]'‘Cr,(n)Crp(p) 


exp 


P  12 

EIl 
\2 

L  k=:l  "'it 


a  = 


Let  0  =  (01,  •  •  •  ,0p)  and  =  (02i  •  •  •  ,9p).  Then  the  conditional  density  of  0 
given  $  is 

dF(e  I  'P)  =  7 - L— =  dF{^) 

(np  —  1)! 

which  is  the  density  function  for  tr  D. 


Proof.  From  the  proof  of  proposition  4,  we  have 


dF(0)  = 


7rp(p  i)r(np) 


[det  A2]'*Crp(n)Crp(p) 


((1-^2 - 0p)02---9p] 


n-p 
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X  Y[{1-02 - - 0p)‘ 

j=2 

X  n  n  (<*0) 

t=2  j=.+l  J  ^ 


where 


Then 


_p(p— 1) 

""  [detA2]"Crp(n)Crp(p)  ‘ 


X  n  (1  -  ^2 — -  2^i  -  — ^r>y 

j=2 

X  n  n  {0i  -  0jf  j  ir(np)a-"P(d^) 

i=2i=t+l  J  ® 


Therefore 


_  dF{Q)  _  _ _ 1_ 


dF(Q  1  ^ ^ - d0i  =  7 - i_6l”P-ie-“'’'a"'’d0i 

^  ^  (fF(^)  r(np)a-”P  (np-1)!^ 

Under  the  hypothesis  Hi,i  :  A?  =  AJ,  the  term  a  is  zero,  and  thus 


dF{e  I  '^)  =  0 


The  alternative  is  A,,i  :  Xf  <  X]  for  at  least  one  i  €  [2,p].  □ 


Chapter  7 
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SUMMARY  AND  CONCLUSIONS 


There  are  several  conclusions  from  this  research  that  need  to  be  stated.  Of  im¬ 
mediate  interest  are  the  technical  results  which  apply  to  the  spatial  processor 
order  determination  problem.  The  second  kind  of  results  from  this  research 
are  the  mathematical  and  statistical  tools  which  are  needed  by  engineers  and 
physicists,  but  which  are  usually  of  little  interest  to  traditional  mathematicians 
and  statisticians. 


7.1  Results  Related  Directly  to  Order  Deter¬ 
mination 

The  immediate  objective  of  this  research  was  to  derive  a  test  statistic  and 
its  distribution  for  determining  the  number  of  significant  sources  observed  by 
an  arbitrary  array  for  a  small  number  of  samples  using  a  hypothesis  testing 
approach.  This  is  the  problem  of  examining  if  eigenvalues  of  a  covariance 
matrix  from  a  complex  multivariate  Gaussian  distribution  are  significantly 
different.  This  is  the  small  sample  complex  principal  components  problem. 
The  form  of  the  required  test  statistics  has  been  known  for  a  long  time.  The 
challenge  is  to  produce  the  distributions  of  the  desired  test  statistics. 
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The  problem  of  finding  efficiently  computable  cumulative  distribution  func¬ 
tions  for  appropriate  test  statistics  is  still  a  problem  I  have  not  solved.  In  this 
thesis,  several  distributions  relevant  to  the  small  sample  system  order  deter¬ 
mination  problem  have  been  found.  These  are  highlighted  below. 

An  exact  solution  we  know  how  to  compute  which  makes  inefficient  use 
of  the  data  was  constructed  as  an  F-distributed  statistic.  This  is  theorem 
6.  It  requires  partitioning  the  data  into  two  independent  sets  yielding  tw’o 
independent  complex  Wishart  matrices.  Then 


mcf  fFiCic^S2C2 
nc^W2C2(^l^\C\ 


~  dncF{2n^2m, 


2ff<!>ici 

c(^SiCi 


2c^S2C2  . 
C^E2C2  ’ 


The  values  assigned  to  Ei  and  E2  are  those  specified  in  the  hypotheses  of 
the  test.  The  form  of  the  distribution  becomes  simplified  when  n  =  m.  The 
cumulative  distribution  function  for  the  F{2n,2n)  distribution  was  derived, 
presenting  a  closed  form  result.  This  result  is  documented  in  theorem  71. 

A  closely  related  statistic  is  for  testing  hypotheses  in  MUSIC.  This  F- 
statistic  is  developed  in  section  6.2.2.  It  is  distributed  according  to  the  distri¬ 
bution  F{2n,2n). 


C(^WyC,C»V^^^VC2 
C!^WzC2C»Q»{I.  -h  ARA»)QCi 


Although  this  may  look  like  a  major  breakthrough,  it  really  is  not.  Covariance 
matrices  E  and  R  must  be  established  as  hypotheses  for  the  test.  Often  the 
noise  covariance  is  taken  to  be  S  =  <7^/,  and  the  vectors  Ci  and  C2  of  unit 
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length.  Further  simplification  occurs  when  the  hypothesis  is  g  =  1  and  if  A  is 
a  vector  of  unit  length. 

The  joint  density  function  of  the  eigenvalues  of  one  complex  Wishart  matrix 
with  respect  to  another  complex  Wishart  matrix  {det(y4  — AB)  =  0}  was  found, 
paralleling  Audeison’s  result  [26]  known  for  the  case  of  real  variables.  This  is 
th^^rem  7.  Under  the  null  hypothesis  of  sphericity,  this  is  a  piece  of  the  signal 
subspace  method  which  is  based  on  examining  the  eigenstructure  of  the  signal 
covariance  matrix.  See  section  5.1. 

A  complexification  of  another  Anderson  result  provides  the  joint  density 
of  ordered  eigenvalues  of  an  Hermitian  matrix  when  the  density  of  the  Her- 
mitian  matrix  is  a  function  of  only  its  eigenvalues.  This  is  theorem  68.  This 
is  a  powerful  result  because  it  allows  us  to  examine  generalizations.  I  have 
rederived  a  result  of  James  [120]  and  Khatri  [137]  through  complexifying  An¬ 
derson’s  joint  density  of  the  eigenvalues  of  a  matrix  distributed  as  CWp{n,  /). 
This  is  theorem  69.  This  distribution  is  fairly  simple,  and  it  corresponds  to 
the  important  case  of  a  pre-whitened  filter.  James’  result  (theorem  70)  for 
the  joint  density  of  the  eigenvalues  from  CWp{n,  S)  is  also  derived.  I  am  not 
aware  of  any  derivation  in  the  literature  of  this  distribution  done  for  the  com¬ 
plex  case  without  reference  to  the  derivation  for  the  real  variables  case.  James 
[120]  wrote  his  result  down  by  inspection  from  the  form  of  the  real  variables 
case.  Takemura  [265]  referred  to  his  derivation  for  the  real  variables  case.  The 
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complete  derivation  for  the  complex  variables  case  was  made  possible  by  the 
work  of  Gross  and  Richards  [96]. 

The  paper  on  zonal  polynomials  of  one  and  two  matrix  arguments  for  the 
combined  cases  of  real,  complex  Hermitian,  and  quaternion  variables  by  Gross 
and  Richards  [96]  is  a  major  key  to  the  pursuit  of  an  expression  for  the  density 
function  of  a  test  statistic  for  the  small  sample  order  identification  problem.  In 
particular,  it  is  this  proof  which  justifies  the  splitting  theorem  (proposition  41) 
for  zonal  polynomials.  It  is  this  splitting  property  that  frees  us  from  the  prison 
of  a  specific  coordinate  system  by  allowing  us  to  integrate  over  all  rotations, 
leaving  us  with  functions  of  only  sample  and  parameter  eigenvalues. 

It  is  the  abstractness  of  the  mathematics  involved  that  allowed  solution  of 
the  problem.  It  was  on  this  point  that  the  validity  of  James’  unproven  result 
[120]  for  the  joint  distribution  of  the  eigenvalues  of  the  sample  covariance 
matrix  for  the  complex  case  hinged  and  had  not  been  established  by  other 
means.  A  contribution  to  the  engineering  community  by  this  thesis  is  the 
narrative  parallel  in  appendix  G  provided  to  Gross  and  Richards’  very  good 
paper.  Their  paper  contains  key  ideas  for  understanding  how  to  investigate 
invariance  problems.  As  a  side  benefit,  it  was  discovered  that  their  induction 
method  hinged  on  a  group  theoretic  version  of  the  LDU  decomposition  which 
engineers  are  familiar  with.  See  equations  G.16  through  G.25.  I  also  provided 
an  alternate  proof  of  their  lemma  5.2  (given  in  this  thesis  as  theorem  98) 
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which  relaxed  one  of  their  conditions.  See  the  discussion  for  equations  G.5 
through  G.ll.  This  result  opens  the  possibility  of  expression  of  the  required 
distribution  using  other  sets  of  polynomials  which  might  be  easier  to  compute 
or  which  might  converge  faster. 

Following  Muirhead’s  work  [187]  for  the  case  of  real  variables,  the  joint 
density  of  the  random  variables  («,•••)  has  been  derived  (given  in  equation 
6.16),  where  u  is  the  statistic  for  testing  sphericity  and  is  given  by  u  = 

It  was  also  shown  that  v  =  tr  A  and  u  are  independent.  See  theorem  10.  The 
density  of  u  for  the  case  of  p  =  2  is  given  in  equation  6.17.  The  cumulative 
distribution  function  for  p  =  2  is  given  in  equation  6.18.  The  density  of  u 
for  the  case  of  p  =  3  was  determined  to  be  computable.  Its  detail  makes  it  a 
suitable  evaluation  problem  for  a  symbolic  mathematics  processor. 

The  density  function  for  the  ratio  of  averages  of  disjoint  sums  of  sequential 
sample  eigenvalues  of  a  complex  Wishart  matrix 

wa5  examined  in  section  6.2.1.  The  density  function  given  as  corollary  3  was 
determined  in  terms  of  a  partitioning  of  5  =  {C^U^EUC)~^  where  S  is 
evaluated  as  specified  by  the  hypothesis.  The  matrix  C  defining  the  linear 
combinations  to  be  compared  is  constructed  as  shown  in  the  example  by  equa¬ 
tion  6.14.  Similarly,  an  expression  for  the  cumulative  distribution  function  is 
determined,  which  is  given  as  theorem  9. 


det  A 
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A  number  of  results  motivated  by  (but  not  paralleling)  Krishnaiah’s  works 
[144]  [145]  [151]  were  produced  in  section  6.3  for  the  case  of 

D  =  diag(/?,-.-,/2)^CH/p(n,A2) 

The  results  are  various  ways  of  using  the  sample  eigenvalues  to  test  if  all 
the  population  eigenvalues  are  equal.  The  tests  differ  in  the  details  of  the 
specification  of  the  alternative  hypothesis.  The  first  method  presented  (section 
6.3.1)  examines  the  joint  density  of  the  ratio  of  adjacent  sample  eigenvalues. 
The  second  method  (section  6.3.2)  examines  the  joint  density  of  the  ratio  of 
the  sample  eigenvalues  to  the  smallest  sample  eigenvalue.  A  third  method 
(section  6.3.3)  is  similar  in  spirit;  it  looks  at  the  joint  density  of  the  ratio 
of  the  sample  eigenvalues  to  the  largest  sample  eigenvalue.  A  last  method 
(section  6.3.4)  examines  the  ratio  of  sample  eigenvalues  to  the  trace  of  the 
matrix.  Even  with  this  simplified  distribution  for  D,  the  marginal  densities  of 
individual  test  statistics  are  difficult  to  evaluate  in  general.  The  densities  for 
the  null  hypothesis  of  equal  population  eigenvalues  has  been  provided.  The 
testing  problem  is  viewed  through  the  mechanism  of  Roy’s  union- intersection 
principle. 

Another  contribution  is  the  discussion  of  the  details  of  the  generalized  max¬ 
imum  likelihood  estimator  of  Kiefer  and  Wolfowitz  [140]  (section  4.2)  which 
are  generally  unknown  in  the  engine<*ring  community.  Engineers  familiar  with 
their  work  in  stochastic  approximation  [291]  will  find  the  discussion  of  gener- 
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alized  maximum  likelihood  estimators  to  be  closely  related  concepts  via  the 
mechanism  of  convergence  of  sequences.  The  generalized  maximum  likelihood 
estimator  involves  an  application  of  the  Radon-Nikodym  derivative  which  is 
usually  studied  in  a  first  course  in  real  and  complex  analysis.  Their  powerful 
concept  was  written  as  a  side  comment  in  an  article  devoted  to  the  study  of 
statistical  consistency.  This  thesis  provides  an  exposition  of  their  concept  as 
a  generalization  of  the  classical  hypothesis  testing  approach  and  to  the  esti¬ 
mation  approaches  others  have  taken.  It  also  is  a  fairly  nice  discussion  on  the 
philosophy  of  what  is  going  on  when  an  estimation  problem  is  done.  It  is  a 
conceptual  springboard  to  much  more  powerful  generalizations. 


7.2  Complex  Statistics  Tools  for  Engineers 
and  Physicists 

Some  results  which  are  necessary  to  support  the  research  of  this  thesis  have 
broader  application.  These  results  are  collected  in  a  systematic  development 
of  the  statistics  of  complex  random  variables.  I  could  not  have  efficiently 
developed  a  comprehensive  theory  of  the  statistics  of  complex  variables  without 
the  very  good  works  by  Arnold  [31],  Muirhead  [187],  and  others  in  the  real 
variables  case.  This  is  a  natural  evolution  of  ideas.  At  the  same  time,  it 


is  cautioned  that  the  extension  of  real  variables  results  to  the  complex  case 
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requires  some  care.  In  particular,  the  complex  multiplication  operator  imposes 
a  structure  on  the  algebra  that  goes  beyond  treating  C"  as  merely  R^".  This 
shows  up  most  clearly  when  dealing  with  changes  of  variables  and  derivatives. 
It  can,  however,  also  be  seen  just  from  examining  the  algebraic  theory  involved. 
These  differences  have  been  demonstrated. 

The  tremendo  is  similarity  of  results  between  the  real  and  complex  cases 
has  occasionally  led  some  extremely  talented  people  to  write  incorrect  results 
down  by  inspection.  Very  few  people  have  worked  on  the  statistics  of  com¬ 
plex  variables  and  the  reported  results  of  several  respected  workers  are  not  in 
agreement.  The  study  of  multivariate  statistics  of  complex  variables  is  still 
young  enough  that  all  results  should  be  reexamined  for  correctness  when  their 
use  is  anticipated.  That  caution  applies  explicitly  to  this  thesis  as  well  as 
to  the  literature  in  general.  This  issue  is  important  to  this  thesis  because 
I  needed  specific  results.  In  particular,  1  needed  the  density  function  of  the 
complex  Wishart  distribution.  A  contribution  of  this  thesis  is  the  rederivation 
of  this  distribution,  following  two  methods  used  by  others  (sections  E.1.2  and 
E.1.3),  and  a  third  by  mathematical  induction  (section  E.1.1)  which  Arnold 
[31]  applied  in  the  real  variables  case.  The  agreement  of  the  results  from  three 
different  approaches  builds  confidence  that  the  result  is  correct.  1  am  pleased 
to  report  that  Goodman  [92]  correctly  reported  the  density  function  (with 
derivation)  for  the  complex  Wishart  distribution. 
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I  do  not  know  of  any  similar  systematic  development  of  the  distributions 
and  properties  of  the  complex  matrix  normal  distribution  (section  D.2)  and 
the  complex  Wishart  distribution  (section  D.3).  Among  the  properties  exam¬ 
ined  in  this  thesis  are  the  response  under  linear  transformation  of  variables, 
conditional  distributions,  and  conditions  for  independence.  I  have  provided  a 
derivation  for  the  matrix  complex  normal  distribution.  The  density  function 
for  this  distribution  has  been  previously  reported  in  the  literature  without 
derivation  by  two  well  known  researchers,  and  their  results  were  not  the  same. 
I  am  pleased  to  report  that  Brillinger  [45]  correctly  reported  the  complex 
matrix  normal  distribution.  I  have  complexified  Arnold’s  results  [31]  for  the 
distribution  of  the  trace  of  a  linear  transformation  of  a  matrix  complex  nor¬ 
mal  random  variable  and  the  distribution  of  twice  the  trace  of  the  argument 
of  the  exponential  in  the  density  function  of  the  matrix  complex  normal  dis¬ 
tribution.  The  distribution  of  2tr(S~^iy)  is  found  to  be  a  chi-square  variable. 
Special  functionals  of  the  complex  Wishart  distribution  were  shown  to  have 
a  chi-square  distribution,  and  with  this  observation  an  F-distributed  statistic 
was  constructed  from  two  such  independent  functionals.  A  complex  version  of 
Hotelling’s  statistic  was  also  derived  (section  D.4). 

Another  contribution  is  the  development  of  the  properties  of  a  charac¬ 
teristic  function  in  the  context  of  complex  variables  (section  B.4).  This  was 
motivated  by  the  definition  of  the  characteristic  function  of  a  complex  variable 
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given  by  C.  R.  Rao  [217].  Included  in  this  is  the  development  of  expected  val¬ 
ues  of  moments  for  the  complex  case.  An  important  part  of  this  contribution 
is  the  demonstration  that  the  expected  value  of  moments  are  not  found  by 
applying  a  derivative.  Rather,  they  are  found  by  the  complex  conjugate  of  the 
derivative  with  respect  to  the  transform  variable  evaluated  at  zero.  Important 
cases  are  worked  out.  Variations  on  ^  det{A  +  cBT)  are  computed.  These  are 
important  because  of  the  application  to  finding  the  expected  value  of  moments 
of  something  related  to  the  complex  Wishart  distribution.  For  the  complex 
Wishart  distribution,  the  characteristic  function  used  is  of  2W  —  A{1V)  where 
A(IV)  is  the  matrix  of  elements  on  the  diagonal  of  W.  This  fact  is  used  to 
demonstrate  the  power  of  using  the  characteristic  function  as  a  generating  func¬ 
tion  for  moments.  Various  other  results  are  given,  including  the  distribution 
of  det(S“^lV),  Wi  -t-  W2,  and  {AW~^ A^)~^  and  some  useful  expected 

values  such  as  the  expected  values  of  [det(VF)]*,  det(IV~*),  W,  tr(lV), 

(tr  W)^,  and  var(tr(VF)).  I  am  in  debt  to  various  results  (section  F.4)  by  Tague 
[264]  which  demonstrate  the  usefulness  of  the  theory  presented.  This  includes 
the  expected  values  of  {W~^ AW~^),  and  tr(IV~^),  and 

var[tr(IV~^)].  The  work  by  Tague  concludes  with  an  example  of  computing 
the  signal-to-noise  ratio  at  the  output  of  a  beamformer  (section  F.5). 


7.3  Other  Results 
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There  are  a  number  of  ideas  developed  which  are  not  central  to  the  main 
research  theme,  but  which  had  to  be  developed  in  order  to  obtain  the  tools 
needed  for  the  main  theme  research.  Some  of  those  which  do  not  fit  neatly 
into  the  above  categories  are  identified  here. 

Of  greatest  practical  importance  to  engineers  and  physicists  are  the  results 
concerning  the  complex  case  for  scalar  derivatives  of  vectors  and  matrices, 
vector  derivatives,  and  matrix  derivatives  (Appendix  B).  This  is  based  on  the 
observations  made  regarding  the  existence  of  complex  derivatives  with  appli¬ 
cation  of  the  Cauchy-Riemann  conditions.  The  caution  is  that  many  results 
reported  in  the  literature  incorrectly  engage  in  maximization  of  Hermitian 
forms  by  attempting  a  derivative  approach.  Final  results  are  often  valid  be¬ 
cause  the  same  result  can  often  be  obtained  by  a  completion  of  squares  or 
projection  approach  to  the  problem.  However,  not  all  such  extrema  results 
are  fortunate  enough  to  be  valid.  Attempting  to  avoid  the  derivative  existence 
issue  by  treating  the  real  and  imaginary  parts  as  separate  variables  is  invalid. 

Other  contributions  of  importance  to  engineers  and  physicists  are  the  re¬ 
lated  results  concerning  change  of  variables  for  the  complex  multivariate  c«ise 
(Appendix  C).  Several  important  observations  are  in  order.  The  first  and  most 
important  observation  is  that  there  is  no  such  thing  as  “the  general  case”.  A 
matrix  without  discernible  structure  is  not  a  general  case.  You  cannot  apply 
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the  change  of  variables  for  an  unstructured  matrix  to  a  structured  matrix  and 
usually  get  the  right  result.  Any  structure  that  exists  in  a  matrix  must  be 
accounted  for  in  a  change  of  variables.  Otherwise,  your  results  are  simply 
wrong. 

For  nonlinear  change  of  variables  I  copied  Muirhead’s  approach  [187]  of 
using  the  exterior  (wedge)  product  to  simplify  the  algebra.  Mathematicians 
discover  this  operator  in  the  study  of  differential  geometry.  The  wedge  product 
is  a  tool  commonly  used  by  physicists  and  nuclear  engineers,  but  rarely  used 
by  most  other  engineers.  With  this  very  practical  application  to  change  of 
variables  for  the  complex  case,  this  tool  should  become  part  of  the  working  set 
of  knowledge  of  all  engineers  involved  in  acoustic  signal  processing.  This  was  a 
necessary  tool  for  computing  the  Jacobian  for  the  change  of  variables  involving 
matrix  quadratics,  for  example  the  form  Y  =  TT^.  Most  of  these  results  are 
complexifications  of  results  by  Muirhead  [187],  Arnold  [31],  and  Deemer  and 
Olkin  [67].  Some  of  these  confirm  results  by  Goodman  [92],  or  confirm  results 
or  make  corrections  of  editorial  problems  in  Khatri  [137].  The  development 
of  the  Jacobians  was  a  necessary  part  of  the  derivation  of  the  various  density 
functions  in  this  thesis. 

I  have  provided  many  decompositions  of  complex  matrices  and  related  re¬ 
sults  (appendix  M)  based  on  complexifying  results  done  for  the  real  case  by 
others.  These  include  special  results  for  complex  triangular  matrices,  eigen- 
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value  decompositions  with  related  results,  proof  of  the  relationship  between 
the  eigenvalues,  determinants,  and  the  trace  of  a  matrix,  square-root  decom¬ 
positions,  polar  decompositions,  Cholesky  decomposition,  singular  value  de¬ 
composition,  and  various  relationships  between  eigenvalues  of  X,  {al  -|-  6X), 
and  {al  -f  bX)~^.  Many  of  these  results  can  be  found  in  Stewart  [259]. 

The  various  constructions  of  a  complex  vector  space  (appendix  J)  drive 
home  the  fact  that  C"  is  not  merely  although  I  have  seen  the  construction 


/ 

X  —iy 

of  a  vector  in  R^"  by  others,  and  have  seen  the  scalar 

^iy  X  j 

This  work  was  motivated  by  examples  from  Nomizu  [193]. 


by  others. 


A  few  integrals  were  computed  (appendix  P),  some  of  which  do  not  appear 
in  Gradshteyn  and  Ryzhik  [94].  These  were  done  in  support  of  evaluation  of  a 
cumulative  distribution  function.  Evaluation  of  /  dX  was  tedious 

(theorem  147),  but  is  one  most  sophomores  can  do.  Generalized  even  and 
generalized  odd  functions  were  defined  (definitions  84  and  85)  and  their  ele¬ 
mentary  properties  demonstrated  (section  P.2.6).  The  integral  /  u”‘(l  —  u)“du 
was  computed  and  interpreted  as  an  expansion  in  terms  of  the  probability  of 
k  failures  in  m  trials  when  0  <  w  <  1  (proposition  103).  The  complexification 
of  Muirhead’s  matrix  Laplace  transform  [187]  of  (det  A)“~"*  with  respect  to 
S  =  where  A  =  is  developed  as  theorem  150.  This  is  the  integral 


/  etr(-E-M)(det  AY-”^{dA)  =  (det  E)“Cr,„(a) 
Ja>o 
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This  provides  an  alternative  interpretation  of  the  complex  Wishart  density  as 
the  entire  integrand  in  the  normalized  transform.  This  is  merely  the  complex- 
ification  of  a  relationship  known  by  specialists  working  in  the  real  variables 
case,  yet  it  is  an  important  one. 

Other  miscellaneous  contributions  include  the  generalized  definition  of  the 
nested  operator  (definition  87)  and  development  of  the  trigonometry  of  com- 

n 

plex  matrices  (appendix  N).  The  definition  of  the  nested  operator  A 

k=l 

is  a  generalization  of  Tuma’s  nested  operator  (section  8.11)  [268]  which  has 
application  in  one  of  the  test  distribution  density  functions  (proposition  1), 
and  has  application  in  recursive  solutions  of  problems.  I  used  the  definition 
of  to  complexify  results  by  Curtis  [64],  which  includes  work  regarding  the 
matrix  logarithm  (section  N.l). 

I  generalized  the  discussion  by  James  [120]  to  show  that  his  set  of  three  si¬ 
multaneous  mappings  can  be  formalized  into  the  setting  of  a  topological  group 
theory  (section  H.6).  I  defined  a  group  G  whose  elements  are  pairs  of  matrices 
with  a  special  operator  (section  H.6.1).  I  then  defined  a  set  A  upon  which 
this  group  acts,  where  A  happens  to  be  the  set  of  all  complex  multivariate 
normal  distributions  (section  H.6.3).  It  is  this  generalization  that  justifies  the 
application  of  the  machinery  of  group  representation  theory.  It  makes  explicit 
that  we  really  are  operating  on  distributions  and  not  just  parameters  of  a 


distribution. 
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7.4  Linear  Algebra  Results  Verified  for  the 
Complex  Case 

Some  results  for  complex  variables  are  true  merely  because  the  space  under 
consideration  is  a  linear  space  for  any  arbitrary  field.  I  verified  all  results  I  had 
a  need  for,  not  knowing  ahead  of  time  whether  the  result  depended  only  on  the 
linear  algebra  for  an  arbitrary  field,  or  whether  there  was  some  modification 
needed  to  specialize  the  results  to  the  complex  case.  The  following  results  do 
not  differ  between  the  real  and  the  complex  cases: 

1.  Partitioned  matrix  right  and  left  inverses  (section  K.3.1). 

2.  Partitioned  matrix  determinants  (section  K.4.2). 

3.  Eaton’s  lemma  1.35  [73]:  det(/„  +  AB)  =  det(/„  +  BA)  with  variations 
(lemma  K.4.3). 


7.5  Other  Simple  Results 

The  results  identified  here  are  results  which  are  mundane.  I  have  not  bothered 
to  see  if  anyone  else  has  produced  them.  They  are  useful,  but  not  challenging. 

1.  Properties  of  skew-Hermitian  matrix  A  =  —A^  (definition  77). 

2.  Proof  that  if  A  is  positive  definite  then  A  is  Hermitian  (proposition  48). 
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3.  Many  explicit  expansions  of  the  trace  of  a  product  of  various  matrices 
(section  K.2).  These  were  developed  to  support  evaluation  of  functions  of 
the  complex  Wishart  distribution  via  the  method  of  differential  functions 
of  the  related  characteristic  function. 


4.  Complex  matrix  inversion  lemmas  (Section  K.3.2). 

5.  Expression  of  the  (p,p)  element  of  an  inverse  matrix  in  terms  of  the 
elements  of  the  original  p  xp  matrix  (lemma  41). 

6.  Proofs  that  [det(y4)]“^  =  det(>l“^)  (proposition  57)  det(>i*)  =  (detv4)* 
(lemma  42),  and  det(>l^)  =  (det  A)*  =  (det  A)^  (proposition  58). 

7.  Proof  that  for  unitary  A  that  det  A  =  e*®  for  arbitrary  0  €  R  (lemma 
43). 


8.  Proof  that  for  orthonormal  complex  matrix  ^4  that  det  A  =  ±1  (lemma 
44). 


9.  From  proposition  61: 


det 


10.  From  proposition  62: 


det 


yB  Ij 


^  A 


I  I 


=  det(A  -  B) 


=  det(A-B) 
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11.  From  proposition  63: 


f  A  0 


\  =  (deti4)(det  jB) 


OBJ 


12.  From  lemma  49; 


" 

• 

/  \ 

“ 

/  \ 

■ 

A(8)/p  C0lp 

A  C 

A  C 

det 

=  det 

®Ip 

= 

det 

B  0  Ip  D®  Ip 

^B 

^B 

13.  From  proposition  65: 

det(/ +  ^2)  =  |det(/ +  2V1)|^ 


7.6  Proofs  for  Results  Stated  by  Others 

1.  Littlewood  p.  19  [167].  U  A  =  -A^  then  (/  +  A){I  -  A)"^  is  unitary 
(proposition  59). 

2.  Littlewood  p.  19  [167].  If  B  is  unitary  and  —1  is  not  a  characteristic 
root  of  B,  then  there  exists  A  =  —  A^  such  that  B  —  {1  +  A){I  —  A)~^ 
(proposition  60). 


7.7  More  Results 

1.  Two  examples  of  structures  involving  Hermitian  matrices  failing  to  form 


a  group  (section  H.5). 


234 


2.  Proof  that  A(adj  A)  =  (iet(yl)/„  for  the  complex  case  (proposition  56). 
The  point  is  that  even  for  the  complex  case 

adj(>l)  =  [(-l)‘^det(X,^)f 

rather  than 

where  Xij  is  the  minor  of  for  matrix  A.  This  is  a  good  example  that 
intuition  and  experience  cannot  be  trusted  to  guide  the  conversion  of 
methods  from  real  to  complex  variables. 

3.  Demonstration  that  the  orthonormal  bases  produced  by  the  Gram-Schmidt 
orthonormalization  process  (appendix  L)  are  not  unique,  but  depend 
upon  the  bilinear  operator  used  in  the  algorithm,  and  that  this  bilinear 
operator  is  not  required  to  be  an  inner  product.  This  demonstrates  that 
the  property  of  having  an  orthonormal  basis  for  a  vector  space  does  not 
imply  that  the  space  is  an  inner  product  space. 

7.8  Comparisons. 

The  theory  I  have  pursued  is  not  yet  fully  developed.  For  selecting  methods 
for  use  in  systems  being  designed  today  (1994),  use  a  different  approach. 

When  constrained  to  serial  processors,  the  estimation  approach  conceptu¬ 
ally  should  yield  quicker  results  than  my  sequential  testing  approach.  However, 
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it  should  be  observed  that  other  estimation  approaches  often  are  sequential 
also.  It  is  cormnon  to  require  the  construction  of  a  family  of  estimates  and 
to  pick  out  the  “best”  estimator  from  that  family.  As  technology  provides  us 
with  economical  and  practical  parallel  processors,  this  source  of  difference  of 
approaches  will  become  less  important. 

The  more  knowledge  you  have,  the  better  decision  you  can  make  and 
the  better  assessment  of  the  decision  quality.  The  information  theoretic  ap¬ 
proaches  used  now  do  not  require  establishment  of  measures  of  effectiveness 
for  the  quality  of  the  produced  estimate,  although  conceptually  it  can  be  done 
via  providing  confidence  intervals.  The  hypothesis  testing  approach  requires 
explicit  identification  of  allowable  error. 

Is  one  method  better  than  the  other?  It  depends  on  your  purpose.  For 
applications,  it  also  depends  on  available  technology. 


Chapter  8 
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FURTHER  RESEARCH 

Suggestions  for  further  work  identified  in  this  section  are  of  two  types.  The 
first  type  is  covered  in  the  section  titled  Extending  This  Research.  That  focus 
is  on  work  needed  to  continue  progress  on  the  topic  of  the  small  sample  order 
identification  problem.  The  second  type  of  recommendations  focus  on  the 
development  of  ideas  and  tools  less  directly  related  to  those  used  in  this  thesis. 
I  think  identification  of  these  less  directly  applicable  ideas  is  vitally  important 
for  the  broader  advancement  of  engineering  and  science. 

The  small  sample  order  identification  problem  subtends  several  areas  that 
desperately  need  more  work.  The  small  sample  statistics  of  complex  matrix 
random  variables  is  still  an  area  that  has  received  little  attention  compared 
to  the  real  variables  case.  The  basic  question  of  what  are  the  appropriate 
properties  of  a  small  sample  test  .statistic  need  to  be  examined. 

Mathematical  tools  for  constructing  the  needed  distributions  need  to  be  col¬ 
lected,  cataloged,  and  more  extensively  developed.  This  includes  the  system¬ 
atic  collection  or  development  of  complex  matrix  algebra  and  calculus  beyond 
what  was  done  in  this  thesis.  Much  of  this  work  already  exists  in  abstraction 
or  is  scattered  throughout  the  literature.  Carefulness  in  reading  the  literature 
is  strongly  recommended.  Results  identified  as  applying  to  the  complex  case 
may  assume  complex  symmetric  matrices  rather  than  complex  Hermitian  ma- 
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trices.  This  is  particularly  true  in  literature  dealing  with  group  theory  or  zonal 
polynomials.  Conditions  for  existence  of  complex  derivatives  are  often  ignored, 
resulting  in  errors  in  the  literature,  particularly  in  the  area  of  adaptive  beam¬ 
forming.  It  is  not  uncommon  to  see  the  erroneous  application  in  the  complex 
case  of  gradients  used  in  optimization  and  search  algorithms.  Treating  the  real 
and  imaginary  parts  of  complex  variables  separately  in  a  gradient  is  invalid 
as  a  method  of  avoiding  the  existence  of  the  complex  derivative  upon  which 
the  optimization  methods  depend.  Jacobians  for  complex  change  of  variables 
reported  in  the  literature  are  not  reliable.  Similarly,  distributional  results  are 
not  yet  reliably  reported.  Caveat  emptor.  Progress  in  related  areas  in  the  last 
few  years  further  increases  the  urgency  for  completing  a  theory  of  complex 
multivariate  analysis  of  stochastic  variables.  This  effort  will  be  of  great  ben¬ 
efit  to  people  studying  acoustics,  signal  processing,  and  other  areas  as  well. 
It  needs  to  be  made  accessible  to  the  level  of  the  engineering  undergraduate 
senior.  Specific  areas  of  mathematical  knowledge  must  be  further  developed. 
In  particular,  progress  in  extending  our  knowledge  of  zonal  polynomials  of 
several  complex  matrices  is  urgently  needed.  There  is  plenty  of  work  yet  to  be 


done. 


8.1  Extending  This  Research 
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This  thesis  has  provided  the  tools  for  development  of  application  theory,  but  is 
not  mature  enough  yet  for  easy  experimental  or  simulation  use  because  of  the 
difficulty  of  evaluating  cumulative  probability  distribution  functions.  Research 
is  still  in  the  embryonic  stage.  The  continuing  work  by  Gross  and  Richards 
in  theory  and  the  continued  use  in  applications  by  Tague  are  providing  the 
theory,  interest,  and  pressure  that  will  eventually  produce  practical  results. 

8.1.1  Some  References  to  Consider 

There  are  a  number  of  references  that  would  be  worthwhile  to  thoughtfully 
consider  with  respect  to  the  problems  of  statistics  related  to  the  order  esti¬ 
mation  problem.  Saw’s  1977  paper  [233]  proposed  a  method  of  computing 
zonal  polynomials  which  motivated  further  work  on  the  problem  which  once 
was  thought  to  be  intractable.  Farrell  [80]  reported  on  the  calculation  of 
complex  zonal  polynomials  in  1980,  which  includes  some  tables.  Understand¬ 
ing  his  paper  requires  understanding  group  characters,  bisymmetric  matrices. 
Young’s  diagrams  and  Haar  measure.  Saw’s  1984  paper  [234]  establishes  the 
connection  between  the  ultraspherical  polynomials  and  distributions  on  the 
m-sphere.  Kushner  and  Meisner  [159]  reported  in  1984  on  integral  and  dif¬ 
ferential  formulas  for  zonal  polynomials.  Watson’s  1986  paper  [277]  discusses 
estimation  theory  on  the  sphere  in  a  Bayesian  setting.  Yu’s  1991  paper  [295] 
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on  recursive  updating  of  the  eigenvalue  decomposition  of  a  covariance  matrix 
may  represent  a  significant  contribution.  It  would  be  of  interest  to  determine 
how  this  affects  the  rank  determination  problem  and  the  question  of  indepen¬ 
dent  samples  now  assumed  in  forming  the  test  statistic.  Shenoy’s  1991  Ph.D. 
thesis  [240]  on  group  representations  and  optimal  recovery  in  signal  modeling 
deals  with  concepts  that  have  repeatedly  surfaced  in  the  background  reading 
required  to  understand  the  work  of  Gross  and  Richards  [96],  thus  indicating 
that  it  deserves  careful  attention. 


8.1.2  Connecting  Gross  and  Richards’  Work  to  Stein 
and  Weiss’  Work 

The  bridge  between  abstract  theory  and  engineering  application  will  be  nar¬ 
rowed  when  the  connection  between  the  work  by  Stein  and  Weiss  [258]  and 
Gross  and  Richards  [96]  are  related  in  terms  understandable  to  a  well  trained 
engineer.  This  is  the  most  urgent  and  productive  next  step.  Krantz’  presen¬ 
tation  of  Stein  and  Weiss  included  some  wonderful  geometric  interpretations 
of  spherical  harmonics  and  zonal  polynomials.  Stein  and  Weiss  did  their  work 
for  the  case  of  real  variables  which  stopped  short  of  the  splitting  theorem  used 
by  James  [120].  The  work  by  Gross  and  Richards  was  done  in  a  more  general 
setting,  yet  it  did  not  include  generalizations  of  some  key  insights  developed 
by  Stein  and  Weiss.  Making  these  connections  must  be  done  by  someone  that 
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understands  both  the  traditional  development  of  special  functions  and  also 
the  background  material  supporting  Gross  and  Richards.  It  would  be  nice 
to  complexify  Stein  and  Weiss,  and  include  the  splitting  theorem  and  other 
developments.  The  abstraction  of  Stein  and  Weiss’  axis  of  spherical  rotation 
needs  to  be  determined  in  the  Gross  and  Richards’  framework. 

The  splitting  theorem  needs  to  be  examined  to  determine  if  it  truly  estab¬ 
lishes  an  equality  (i.e.,  <=  and  =>  both  hold  in  the  derivation)  or  whether  the 
correct  statement  of  the  theorem  is  only  =^.  The  practical  consequence  is  that 
we  know  from  Gross  and  Richards  work  [96]  that  a  complex  zonal  polynomial 
of  Hermitian  matrix  argument  Zm(W^)  has  the  same  value  as  when  the  argu¬ 
ment  is  the  matrix  of  the  eigenvalues  of  W.  When  this  is  connected  with  the 
splitting  theorem,  and  if  it  is  bidirectional,  then  via  definitions  of  generalized 
hypergeometric  functions  of  one  and  two  matrix  arguments  we  determine  that 
exp(E"*  =  exp(A~^Z/^)  where  is  the  diagonal  matrix  of  eigenvalues  of  S, 
and  is  the  diagonal  matrix  of  eigenvalues  of  W.  I  have  generated  a  numerical 
counterexample  to  this. 

Although  Gross  and  Richards  [96]  defined  a  differential  operator  (p.  788, 
Section  2.2)  as  part  of  the  development  of  zonal  polynomials,  it  appears  to 
not  have  been  necessary  to  achieve  the  results  they  were  after.  In  particular, 
their  equation  2.2.8  is  not  required  for  proving  their  equation  2.2.5.  If  you 
define  a  polynomial  as  a  vector  in  the  way  done  by  Broida  and  Williamson 
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[47],  then  the  inner  product  between  two  polynomials  defined  by  Gross  and 
Richards  which  used  the  differential  operator  can  be  replaced  by  the  usual 
vector  inner  product  on  the  vector  form  of  two  polynomials.  The  application 
of  differentiation  used  later  to  establish  a  set  of  coefficients  to  guarantee  series 
convergence  provides  a  nice  but  not  necessary  motivation.  This  is  good  in  the 
sense  that  the  ever  present  troubling  issue  of  differentiability  for  the  complex 
case  can  be  avoided.  However,  it  appears  that  proof  of  differentiability  is  a 
desirable  achievement.  That  is  part  of  the  link  back  to  the  work  by  Stein  and 
Weiss.  I  still  have  not  yet  determined  if  the  set  of  functions  I  derived  forms 
a  complete  set.  Alternately,  I  have  not  yet  determined  the  set  for  which  the 
derived  set  of  functions  is  complete. 

It  would  be  useful  to  examine  the  family  of  generalized  functions  resulting 
from  the  relaxed  definition  of  the  inner  product.  It  is  known  that  Fourier 
expansions  converge  more  slowly  than  other  expansions.  It  would  be  interest¬ 
ing  to  see  if  a  careful  choice  of  inner  product  could  produce  an  oblique  set  of 
functions  that  converge  faster  or  are  easier  to  compute  than  zonal  polynomials. 


8.1.3  Computation  of  Zonal  Polynomials 

Takemura  [265]  remarked  about  the  relationship  between  real  and  complex 
zonal  polynomials.  This  linkage  needs  to  be  translated  into  the  language  of 
engineering.  This  link  has  already  been  made  by  other  researchers.  Separate 


papers  have  been  published  on  the  computation  of  complex  zonal  polynomials. 
It  is  said  that  computing  complex  zonal  polynomials  is  easier  than  for  real  zonal 
polynomials.  I  think  the  relationship  can  be  obtained  from  a  careful  reading 
of  Gross  and  Richards  with  that  goal  in  mind.  Care  still  needs  to  be  exercised 
to  determine  if  the  “easy”  computations  apply  to  synunetric  complex  Wishart 
matrices  or  to  Hermitian  Wishart  matrices. 

Similarly,  the  works  of  Constantine  need  to  be  folded  into  this  study.  See 
references  [56]  [57]  [58]  [59].  Constantine  relied  on  zonal  polynomials  defined  as 
arguments  of  positive  definite  complex  symmetric  (not  Hermitian)  matrices. 
We  know  that  symmetry  and  Hermitian  symmetry  endow  a  matrix  with  differ¬ 
ent  properties.  Constantine  remarks  [57]  (p.l272)  that  since  zonal  polynomials 
are  polynomials  in  the  characteristic  roots,  then  the  definition  of  zonal  polyno¬ 
mials  can  be  extended  to  arbitrary  complex  symmetric  matrices.  Consistently, 
he  defines  hypergeometric  functions  of  a  complex  symmetric  matrix  variable 
in  terms  of  zonal  polynomials  of  complex  symmetric  matrix  argument.  Since 
Constantine  [57]  and  James  make  use  of  the  works  of  Herz  [106],  it  makes  sense 
to  reexamine  Herz’  work  more  closely  to  determine  the  restrictions  Herz  uses. 
For  example,  Herz  studies  m  x  m  complex  symmetric  matrices  with  a  positive 
definite  real  part.  Does  it  naturally  follow  that  the  same  zonal  polynomial  is 
defined  for  Hermitian  symmetric  matrices?  I  can  produce  an  Hermitian  matrix 
and  a  different  complex  symmetric  matrix  which  have  the  same  set  of  eigen- 
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values.  The  work  by  Gross  and  Richards  [96]  shows  that  zonal  polynomials 
are  definable  for  Hermitian  matrices,  but  are  these  the  same  as  those  treated 
by  James  and  Constantine? 

The  importance  of  the  question  is  because  Constantine’s  work  is  the  basis 
for  the  evaluation  of  zonal  polynomials  in  a  few  specialized  and  very  important 
cases,  such  as  the  noncentral  real  Wishart  matrix.  It  may  be  that  the  basic 
properties  carry  over,  but  with  different  constants.  This  raises  the  issue  of 
having  to  pay  close  attention  to  any  derivative  works  based  on  James  or  Con¬ 
stantine  when  working  with  Hermitian  matrices,  to  ensure  that  cited  results 
of  previous  workers  apply.  It  might  not  be  enough  to  merely  say  that  a  result 
applies  to  complex  matrices.  It  would  be  logical  to  define  a  complex  Wishart 
distribution  that  applies  to  a  complex  symmetric  matrix.  I  have  not  seen  this 
issue  made  a  point  of.  Noting  how  important  matrix  structure  has  been  in 
modifying  results  from  the  real  to  the  complex  Hermitian  case,  the  question 
needs  to  be  asked.  The  goal  is  to  validate  or  achieve  similar  results  consistent 
with  the  work  by  Gross  and  Richards. 

As  an  aside,  another  consideration  is  that  zonal  polynomials  are  symmet¬ 
ric  functions  of  its  arguments.  A  function  /(x,  i/)  is  called  “symmetric”  if  it  is 
invariant  under  permutation  of  its  arguments.  This  means  if  you  change  the 
order  of  the  arguments  then  the  value  of  the  function  docs  not  change.  The 
trace  and  the  determinant  of  a  matrix  are  symmetric  functions  of  the  eigenval- 
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ues  of  a  matrix.  Thus,  a  zonal  polynomial  as  developed  by  Gross  and  Richards 
[96]  is  a  symmetric  function  of  its  Hermitian  matrix  argument.  The  value  of 
the  zonal  polynomial  depends  only  on  the  value  of  the  unordered  eigenvalues 
of  its  Hermitian  matrix  argument.  What  may  be  happening  is  that  authors 
are  unintentionally  injecting  ambiguity  into  the  research  by  not  differentiating 
between  symmetry  of  the  function  and  symmetry  of  i\ie  function’s  argument. 
When  interest  was  restricted  to  real  symmetric  matrices  there  was  no  need 
to  be  careful  to  distinguish  between  the  two  because  the  answer  was  “yes” 
in  both  cases.  When  working  with  fields  having  more  structure,  more  care  is 
needed. 

The  comments  of  the  preceding  paragraphs  are  not  criticisms  of  the  work 
of  the  mentioned  authors.  Rather,  the  comments  point  out  that  the  work  they 
did  is  related  to  the  present  application,  and  because  terminology  is  so  similar 
the  unwarned  researcher  may  inadvertently  apply  results  directly  without  first 
answering  the  question  if  the  same  set  of  assumptions  are  used. 


8.1.4  Calculus  of  Zonal  Polynomials 

An  immediate  next  step  that  will  lead  to  useful  results  hinges  on  the  ability 
to  perform  integration  on  zonal  polynomials.  This  integration  is  necessary 
to  evaluate  marginal  distributions  of  test  statistics.  The  work  in  this  thesis 
leads  up  to  the  point  where  changes  of  variables  can  be  used  to  construct  test 
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statistics  much  in  the  same  way  Krishnaiah  has  done  for  the  real  variables 
cases.  Krishnaiah  has  stated  the  results  for  several  integrals. 

8.1.5  Distributional  Theory  and  Tools 

There  is  still  much  work  to  be  done  to  support  the  development  of  statistics 
of  complex  variables  for  application  to  problems  of  engineering  and  physics. 
Critical  to  this  process  are  two  areas.  The  first  is  the  development  and  system¬ 
atization  of  Jacobians  for  changes  of  variables.  A  starting  point  is  Roy’s  work 
of  1952  [228].  Distributional  work  relies  very  heavily  on  being  able  to  perform 
well-selected  changes  of  variables.  There  are  plenty  of  results  that  have  been 
done  for  the  case  of  real  variables  that  need  to  be  adapted  to  the  world  of 
complex  variables.  Distributional  work  needs  to  begin  with  the  Gaussian  case, 
but  must  grow  beyond  it.  Work  such  as  Olkin  and  Roy’s  1954  paper  [198]  is 
applicable  and  needs  to  be  extended  to  the  case  of  complex  variables.  For  the 
physicist,  this  work  needs  to  be  done  for  quaternions  as  well  as  for  complex 
variables.  The  second  area  that  needs  work  is  the  study  of  invariance  at  an 
abstract  level.  This  work  has  been  partially  addressed  with  the  attention  given 


to  Gross  and  Richards. 
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8.1.6  Noncentral  Distributions  and  Power  Functions 

Once  we  have  the  results  for  the  central  complex  Wishart  distribution  based 
tests  worked  out,  we  must  turn  attention  to  the  development  of  the  noncentral 
complex  Wishart  distribution  and  the  noncentral  distribution  of  related  test 
statistics.  From  these,  we  need  to  work  out  the  power  functions  for  the  various 
tests.  As  we  saw  for  the  case  of  the  central  complex  Wishart  distribution, 
we  cannot  yet  take  for  granted  that  previously  published  results  are  valid. 
Those  results,  where  they  exist,  should  be  reexamined  carefully,  developing 
the  related  necessary  tools. 


8.2  Bridging  Theory  and  Application 

8.2.1  Important  Authors  to  Consult 

Krishnaiah  is  the  author  most  prolific  in  examining  specific  tests  based  on 
eigenvalues  of  a  real  or  complex  Wishart  matrix.  Some  effort  may  be  worth¬ 
while  to  understand  the  report  by  Krishnaiah  and  Shuurmann  [151].  It  appears 
that  they  have  performed  evaluations  for  special  cases  for  which  expressions 
might  be  available  for  zonal  polynomials. 

The  works  of  C.  R.  Rao  deserve  much  greater  attention.  There  is  a  need 


to  examine  his  work  thoroughly  and  apply  it  to  this  problem  (as  well  as  to 
other  problems).  One  important  work  I  rediscovered  after  the  research  phase 
of  this  thesis  was  finished  is  reference  [214].  This  includes  work  on  matrix 
approximations  specifically  applied  to  complex  matrices. 

Another  author  deserving  of  attention  is  Steen  Andersson.  See  references 
[27]  [28]  [29].  The  first  paper  considers  distributions  of  maximal  invariants 
using  quotient  measures.  This  work  should  be  studied  in  the  context  of  Gross 
and  Richards  [96].  The  second  paper  considers  testing  various  real  matrices 
to  determine  if  they  have  complex  or  quaternion  structures.  The  joint  density 
of  the  eigenvalues  are  derived  up  to  an  unspecified  norming  constant,  and 
the  exact  values  of  all  norming  constants  are  derived  simultaneously  using  a 
method  involving  recursion  formulae.  The  moments  of  the  likelihood  ratio 
statistics  used  in  testing  are  obtained  from  these  norming  constants. 

Likewise,  anything  written  by  Muirhead  should  receive  attention.  Muir- 
head’s  work  is  particularly  useful  in  developing  the  machinery  to  obtain  non¬ 
central  distributions.  These  are  required  for  determining  the  power  of  tests.  It 
is  his  application  of  areas  of  mathematics  that  are  nontraditional  for  statistical 
(and  engineering)  work  that  enables  his  computation  of  some  otherwise  very 
difficult  noncentral  distributions. 

An  idea  from  Kshirsagar  [154]  is  to  look  at  the  distribution  of  moments 
of  the  test  statistic.  This  concept  has  not  been  attempted  in  this  thesis,  yet 
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it  deserves  consideration  by  future  workers.  In  particular,  he  applies  this 
concept  to  sphericity  tests.  Recall  that  one  method  for  completely  defining  a 
distribution  is  to  know  all  of  the  moments  of  the  distribution.  Kshirsagar  is 
a  recognized  authority  in  multivariate  analysis,  and  in  principal  components 
and  test  distribution  theory  in  particular. 


8.2.2  Small  Sample  Test  Theory 

The  fundamental  question  of  what  constitutes  a  good  estimator  for  small  sam¬ 
ple  statistics  deserves  some  study.  The  attention  and  controversy  regarding 
consistency  properties  with  respect  to  some  information  theoretic  based  meth¬ 
ods  (particularly  AIC)  implies  that  this  question  has  yet  to  be  authoritatively 
answered.  For  example,  we  know  that  consistency,  as  technically  defined  in 
statistics,  is  not  a  required  nor  necessarily  desirable  property  [55].  However, 
lack  of  consistency  has  been  referred  to  in  the  engineering  literature  [129]  «is 
a  disqualifying  property.  That  is  appropriate  for  the  large  sample  case,  but 
not,  in  itself,  appropriate  for  the  small  sample  case.  Refer  to  section  4.3.1  for 
a  detailed  discussion  of  this  issue. 

There  are  other  questions  as  well.  Will  a  biased  minimum  variance  esti¬ 
mate  do?  Should  you  apply  asymmetric  confidence  bounds  based  on  some 
utility  curve  derived  cost  function?  Must  the  test  statistic  for  comparing  es¬ 
timators  be  invariant  with  respect  to  coordinate  transformations?  These  and 
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other  questions  need  to  be  collected  and  systematically  examined.  The  idea  of 
looking  at  parameter  estimators  for  a  distribution  in  terms  of  bias,  etc.,  is  not 
new  and  should  be  incorporated  into  the  thinking  about  the  order  estimation 
problem.  Bickel  and  Doksum  [40]  discuss  this  at  length,  and  it  is  one  of  the 
finest  texts  on  mathematical  statistics  that  does  not  require  the  reader  to  have 
a  background  in  measure  theory. 

8.2.3  Approximation  Theory 

The  approximation  techniques  as  discussed  in  Keener’s  well  written  text  [131] 
need  to  be  applied  to  the  problem  to  obtain  practical  (easily  computable) 
results  once  we  understand  what  the  correct  exact  forms  are.  Doing  this  does 
not  require  a  radically  nontraditional  mathematical  background  for  engineers 
once  the  basic  form  for  zonal  polynomials  is  understood. 

8.2.4  Burnside’s  Theorem  and  Characteristic  Functions 

Newman’s  presentation  [192]  (p.l66)  of  Burnside’s  theorem  on  irreducible  sets 
of  matrices  raises  an  interesting  question.  The  question  is  related  to  the  subject 
content  of  this  thesis  in  that  the  notion  of  eigenvalues  is  intimately  wrapped 
in  the  theory  of  invariance  and  irreducibility.  Burnside’s  theorem  is  stated  as 


follows. 
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Theorem  11  Let  F  be  algebraically  closed.  Lei  G  =  {A}  be  a  subgroup  of 
GL(n,F)  which  is  irreducible  as  a  set  of  matrices.  Then  any  relationship 
TpgOpg  =  0  for  T  €  F  can  hold  for  all  A  =  (op,)  of  G  if  and  only  if  Tp^  =  0 

P.9 

with  Tpg  =  0  for  all  p,  q . 

If  you  now  consider  i  Re{53  Tp^ap,}  as  the  argument  of  the  exponential 

P.9 

function  and  take  its  expectation,  you  have  a  characteristic  function.  When 
you  generate  moments  using  a  characteristic  function,  you  evaluate  at  T  =  0. 
So,  the  question  is  ’’What  does  Burnside’s  theorem  say  about  characteristic 
functions?”  Since  characteristic  functions  present  a  sometimes  easier  way  of 
achieving  distributional  results  we  need  in  the  order  estimation  problem,  an¬ 
swering  this  question  for  the  complex  case  will  give  us  insights  to  an  important 
tool. 


8.2.5  Sturm  Separation  Theorem  and  Parallel  Process¬ 
ing 

Another  tool  that  deserves  inquiry  is  the  application  of  the  Sturm  Separation 
Theorem  as  found  in  C.  R.  Rao  (p.  64,  section  lf.2.13(vi))[213].  The  advent  of 
parallel  processing  makes  use  of  this  theorem  to  compute  eigenvalues  practical. 
The  idea  is  that  the  eigenvalues  of  a  principal  minor  of  a  matrix  provide 
estimation  limits  for  beginning  a  search  for  eigenvalues  of  the  next  size  larger 
principal  minor.  Let  Ak  be  the  k'^  principal  minor  of  the  square  matrix  A 


8.2.6  Empirical  Characteristic  Functions 
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Epps  discussed  characteristic  functions  from  a  geometric  point  of  view  in  a 
1993  tutorial  [78].  In  this  paper,  he  proposed  the  use  of  empirical  charac¬ 
teristic  functions  as  a  basic  for  hypothesis  tests.  Epps  points  out  that  such 
an  approach  makes  handling  of  procedures  for  estimation  of  parameters  of 
mixture  distributions  easier,  which  applies  to  the  case  of  sonar. 


8.3  Acoustics  and  Signal  Processing 

8.3.1  Processor  Structure 

As  success  is  achieved  in  developing  the  necessary  mathematical  tools,  some 
thought  will  need  to  be  given  to  the  implementation.  Those  tests  based  on  a 
likelihood  ratio  test  can  be  realized  with  an  estimator-correlator  (or  estimator- 
subtractor)  structure.  This  is  the  case  for  the  sample  eigenvalue  ratios.  The  F- 
tests,  however,  were  not  derived  from  a  likelihood  ratio.  Examining  these  tests 
and  devising  the  relevant  structures  of  associated  processors  is  still  needed. 

8.3.2  Time  Variation  of  Noise  Field 

There  are  two  fundamental  approaches  supported  by  this  thesis,  yet  further 
research  results  in  the  area  of  acoustic  oceanography  are  needed  to  help  make 
the  decision  regarding  which  approach  is  appropriate  at  the  moment.  In  one 
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approach  you  only  consider  the  sample  covariance  of  signal  plus  noise,  S  +  N. 
In  the  other  approach,  you  obtain  an  independent  estimate  /V*  of  the  noise 
and  then  look  at  (5  +  N)  —  N*.  If  your  estimate  of  the  noise  covariance  is 
good,  you  want  to  choose  the  second  method.  If  your  estimate  of  the  noise 
covariance  is  bad,  then  you  want  to  choose  the  first  method.  The  trade-off 
point  is  a  fundamental  statistical  question  that  needs  to  be  answered. 

Then  you  need  to  consider  the  acoustic  oceanography  aspects.  Over  what 
period  of  time  is  an  estimate  likely  to  be  good  enough  to  use?  How  do  you  best 
propagate  noise  estimates  in  time?  The  Kalman  filter  approach  is  one  way. 
Use  data  such  as  sea  state,  precipitation  rate,  and  noise  in  frequency  bands 
different  than  the  band  of  interest  as  concomitant  variables.  Also,  geography, 
array  geometry,  and  array  platform  orientation  can  be  included  in  the  sample 
noise  covariance  matrix  prediction  method.  The  paper  by  Scharf  and  Lytle 

[235]  is  related  to  these  questions.  So  also  is  the  paper  by  Scharf  and  Tufts 

[236] . 

8.3.3  Patterned  Arrays 

Patterned  arrays  impose  a  structure  on  the  covariance  matrix  of  data  passing 
through  the  beamformer.  This  additional  structure  modifies  the  Jacobians  for 
changes  of  variables  applied  in  the  derivation  of  probability  distributions  that 
ultimately  show  up  in  the  sampling  distribution  of  covariance  matrix  eigenval- 
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ues.  The  approach  taken  in  this  thesis  is  an  important  first  step,  but  it  is  not 
sufficient  for  cases  of  special  importance  such  as  the  line  array  with  equally 
spaced  elements.  Idealized  line  arrays  have  been  extensively  studied,  and  the 
covariance  matrix  often  modeled  as  a  Toeplitz  matrix.  Random  distortions  in 
a  towed  line  array  invalidate  assumptions  that  make  an  idealized  line  array 
efficient  and  simple  to  work  with. 

The  inclusion  of  considering  the  matrix  complex  normal  distribution  begs 
the  question  of  how  this  can  be  applied  to  rectangular  receiving  arrays.  It  has 
nicely  built-in  parameter  matrices  that  can  be  thought  of  as  a  row  covariance 
matrix  and  a  column  covariance  matrix. 

8.3.4  Multipath  Detection 

This  work  is  related  to  the  problem  of  detection  in  a  multipath  environment. 
To  increase  the  probability  of  detection,  you  would  like  to  consider  several 
paths  simultaneously  rather  than  treating  each  path  independently.  Mirkin’s 
thesis  [183]  on  use  of  stocheistic  maximum  likelihood  estimators  considers  not 
only  several  paths  simultaneously,  but  also  the  whole  acoustic  field.  To  formu¬ 
late  the  problem,  he  requires  that  the  number  of  sources  be  known.  Techniques 
in  this  thesis  form  a  piece  of  the  problem  of  estitnating  the  number  of  .sources, 
but  is  not  sufficient  by  itself  to  .solve  the  whole  problem. 

The  general  unstructured  array  described  at  the  beginning  of  this  thesis 
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is  a  three  dimensional  array.  Thus  any  discussion  relating  to  determining  the 
number  or  direction  of  signal  arrival  paths  applies,  regardless  of  the  direction 
from  which  the  signal  arrives. 

The  clustering  of  arrival  paths  and  signals  into  sources  is  another  prob¬ 
lem  with  another  set  of  competing  theories  for  attacking  it,  yet  the  statistical 
concepts  of  this  research  are  a  piece  of  that  larger  problem.  Other  related 
theories  include  cluster  analysis,  discriminant  analysis,  factor  analysis,  proba¬ 
bilistic  neural  networks,  expert  systems  theory,  etc. 

Another  way  of  viewing  the  problem,  addressed  by  Buckley  [48],  is  to 
look  backwards  at  the  problem.  He  does  so  for  a  general  array  and  applies 
a  Karhunen-Loeve  decomposition  in  his  treatment  of  the  problem.  He  uses  a 
norm  to  determine  if  he  has  reconstructed  his  signal  ’’well  enough”  as  a  way  of 
selecting  the  number  of  significant  singular  values.  It  would  be  good  to  revisit 
his  work  with  the  view  of  making  a  determination  from  a  statistical  point  of 
view. 

8.3.5  Analogy  of  Temporal  and  Spatial  Domain  Signal 
Processing 

It  is  routine  to  remark  that  there  is  a  mapping  between  signal  processing  in 
the  time  domain  and  array  processing  in  the  spatial  domain.  It  is  filter  theory 
applied  in  different  domains.  The  usual  mapping  noted  is  the  relation  between 
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linear  arrays  and  sampling  in  the  time  domain.  It  would  be  of  interest  to  take 
theory  of  array  processing  and  map  it  back  to  time  domain  processing  to  see 
what  can  be  learned. 

8.3.6  State  Space  Processing 

The  paper  by  Prasad  and  Chandna  [208]  proposed  use  of  a  state  space  approach 
to  bearing  determination  for  a  uniform  line  array  with  a  canonical  correlation 
approach  for  solving  for  coefficients.  Using  these  two  ideas  the  concept  might 
be  generalizable  to  an  arbitrary  array.  The  techniques  of  this  thesis  come 
to  bear  in  choosing  the  number  of  significant  eigenvalues  in  the  canonical 
correlation. 

8.3.7  Application  to  Intensity  Measurement 

There  might  be  some  application  of  the  statistical  work  developed  in  this  thesis 
to  the  estimation  of  noise  when  setting  up  an  intensity  measurement  exper¬ 
iment,  and  later  accounting  for  noise  in  the  analysis  of  data.  Since  pressure 
is  treated  as  a  complex  quantity  it  is  natural  to  apply  statistics  of  complex 
variables  when  examining  sources  of  variation.  For  example,  one  could  esti¬ 
mate  the  noise  field  at  diflferent  locations  before  turning  the  source  on.  Dur¬ 
ing  the  experiment,  you  then  have  the  ability  to  perform  a  hypothesis  test 


to  determine  if  a  nodal  line  in  the  field  has  been  found.  You  test  to  see  if 


257 


the  observed  pressure  is  statistically  significantly  different  from  the  previously 
measured  noise-only  case.  The  same  test,  interpreted  in  another  way,  tells  you 
the  chance  that  the  data  you  are  getting  is  something  other  than  noise.  For 
another  example,  you  could  use  the  covariance  matrix  decomposition  of  data 
from  4  microphones  to  determine  directions  of  arrival  in  three  dimensions  and 
estimate  the  variance  from  that  direction. 
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MATHEMATICAL  BACKGROUND 
A.l  Organization  of  Appendices 

Much  of  the  material  presented  in  this  and  other  appendices  to  follow  is  new 
in  the  sense  that  the  forms  presented  here  have  not  been  expressed  explicitly 
in  the  context  of  a  development  of  the  algebra  of  complex  valued  vectors  and 
matrices.  Likewise,  most  of  it  is  old  in  the  sense  that  the  basic  concepts 
are  well  understood,  occasionally  have  been  proven  for  a  more  general  case 
which  includes  complex  variables,  or  are  so  trivial  that  no  one  thought  them 
important  enough  to  write  down  for  publication. 

The  appendices  are  arranged  as  follows.  The  first  set  (A-F)  consists  of 
material  for  this  thesis  which  is  necessary  background  and  which  I  think  readers 
would  be  most  interested  in  referring  to.  This  set  includes  the  appendices  on 
matrix  differential  operators,  definitions  and  properties  of  distributions,  and 
density  functions  of  distributions.  The  material  on  characteristic  functions  is 
especially  important,  and  also  fun. 

The  second  set  of  appendices  (G-J)  consists  of  very  important  abstract 
mathematical  background  which  any  researcher  desiring  to  extend  or  critique 
this  thesis  must  master,  but  which  I  think  is  of  secondary  interest  to  the 
reader  who  wants  instead  to  be  a  user  of  the  results  of  this  thesis.  This 
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second  set  includes  the  appendices  on  zonal  polynomials,  group  theory,  Hilbert 
space,  and  complex  vector  space.  I  expect  this  to  be  of  primary  value  to 
those  with  engineering  background  who  need  to  quickly  understand  the  group 
theoretic  concepts.  A  mathematician  will  find  this  section  trite.  I  found 
the  source  material  on  zonal  polynomials  at  times  very  difficult.  Perhaps 
the  most  challenging  and  important  contributions  of  this  thesis  are  found  in 
the  appendix  on  zonal  polynomials.  My  hope  is  that  you  will  realize  a  time 
savings  in  developing  an  understanding  by  this  translation  from  the  language 
of  mathematicians  to  the  language  of  engineers. 

The  third  set  of  appendices  (K-P)  consists  of  basic  linear  algebra  of  complex 
matrices.  It  is  material  I  expect  any  technical  senior  undergraduate  to  be 
capable  of  producing,  but  which  is  not  assembled  elsewhere  in  texts  or  the 
research  literature  in  an  expository  fashion.  The  real  variables  forms  of  most 
of  these  results  are  part  of  the  routine  working  set  of  knowledge  for  people 
who  have  had  one  reeisonable  course  in  matrix  algebra.  It  is  included  because 
the  details  of  the  results  are  used  throughout  this  thesis  and  I  have  noticed 
that  the  similarity  of  forms  with  the  real  variables  cases  have  occasionally  led 
other  researchers  to  make  minor  errors  by  a  factor  of  2,  typically  with  constant 


multipliers  or  powers. 
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A. 2  Mathematical  Background  References 

Although  this  thesis  is  of  primary  importance  to  engineers,  the  type  of  math¬ 
ematics  used  in  this  thesis  is  outside  the  usual  background  of  either  engineers 
or  statisticians.  There  are  a  few  good  books  that  provide  good  preparation 
for  the  material  in  this  thesis.  Even  though  a  thesis  is  supposed  to  be  self- 
contained  and  stand  alone,  knowing  which  references  are  useful  makes  the 
process  of  reading  and  learning  far  more  efficient.  It  may  also  help  provide 
the  background  for  specialized  terminology  which  I  may  not  have  adequately 
defined  for  a  serious  reader. 


A. 2.1  Linear  Algebra 

There  are  some  very  good  books  on  linear  algebra.  My  favorite  is  the  quality 
text  by  Broida  and  Williamson  [47].  Beyond  the  regular  fare  of  linear  algebra 
texts  an  engineer  is  likely  to  study  from,  it  introduces  groups  in  a  natural 
way  early  in  order  to  build  on  the  concept.  It  has  a  very  nice  treatment  of 
determinants  and  polynomials.  Its  presentation  of  a  polynomial  as  an  n-tuple 
is  the  key  to  avoiding  the  differential  operator  which  Gross  and  Richards  used 
in  their  development  of  zonal  polynomials.  Broida  and  Williamson  also  discuss 
multilinear  mappings,  exterior  products,  and  Hilbert  spaces.  Also  of  interest 
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to  engineers  and  physicists,  they  have  a  very  nice  introduction  to  tensors. 

Another  linear  algebra  text  with  a  solid  development  is  by  Nomizu  [193]. 
This  is  an  undergraduate  text  intended  for  mathematics  majors.  Chapter  8  of 
[193]  is  very  important  for  distinguishing  properties  associated  with  Hermitian 
(A  =  unitary  {AA^  =  A^A  =  /),  normal  (AA^  =  A^A),  synunetric 
{[(Ax,j/)  =  (x,At/)]  for  all  x,y  ^  V},  and  orthogonal  transformations.  Note 
that  there  exist  complex  orthogonal  transformations  {AJ A  =  AA^  =  I)  as 
well  as  unitary  {B^B  =  BB^  =  7)  transformations,  yet  A  7^  B.  I  know  of 
no  other  text  that  points  out  these  distinctions.  In  extending  work  from  the 
real  field  to  the  complex  field,  this  means  that  one  cannot  take  for  granted 
that  properties  claimed  as  the  important  ingredients  for  a  proof  really  are  the 
ones  being  used.  In  working  with  the  field  of  real  numbers,  often  stronger 
conditions  are  hypothesized  when  weaker  ones  would  do.  This  is  because  for 
symmetric  positive  definite  matrices,  the  matrix  having  the  various  properties 
coincide. 

The  book  on  numerical  linear  algebra  by  G.  W.  Stewart  [259]  is  a  gentle, 
yet  mathematically  respectable,  advanced  undergraduate  or  first  year  graduate 
text  that  treats  complex  matrices  when  it  can  be  done  without  much  additional 
effort.  Chapter  5  on  eigenvalues  and  eigenvectors  is  developed  in  C",  which  is 
the  natural  setting  for  discussing  Gerschgorin  disks.  This  is  a  classic  example 
where  working  in  C  makes  a  topic  real  easy,  and  working  in  R  makes  the 
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discussion  very  complex.  He  has  a  nice  treatment  on  norms  and  condition 
numbers.  He  recommends  perturbation  estimates  as  a  quick,  non-rigorous 
look  at  what  might  otherwise  be  a  mathematically  difficult  problem.  Stewart 
and  Sun  coauthored  a  fine  sequel  [260]  devoted  to  perturbation  analysis  that 
is  also  worthy  of  use.  Among  other  topics,  this  book  examines  the  relationship 
between  the  singular  values  of  a  matrix  and  partitions  of  that  matrix,  and  also 
singular  values  of  linear  transformations  of  that  matrix. 

The  book  by  Horn  and  Johnson  [112]  is  a  major  important  text  that  de¬ 
serves  to  be  read  as  a  prerequisite  to  Rao’s  text  [213]  on  multivariate  analysis. 
Among  its  various  topics,  this  book  discusses  complex  symmetric  matrices  and 
Gershgorin  disks.  It  has  an  encyclopedic  treatment  of  matrix  algebra.  It  does 
not  treat  differentiation  or  integration  of  matrices. 


A.2.2  Multivariate  Statistics 

C.  R.  Rao’s  book  [213]  is  a  wonderful  treatment  of  multivariate  statistics. 
He  does  not  shy  away  from  powerful  generalizations  where  it  can  be  done 
profitably.  He  uses  matrix  notation  throughout.  He  shares  with  Skudrzyk 
[248]  the  wonderful  habit  of  carefully  laying  down  the  mathematical  tools 
before  leaping  into  the  subject  material  requiring  it.  Unlike  this  thesis,  the 
mathematical  background  i.s  not  hidden  in  appendices.  He  introduces  those 
points  of  measure  theory  necessary  for  later  work.  C.  R.  Rao’s  references  are 
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extensive  and  reflect  a  respectful  sense  of  history. 

The  text  by  Eaton  [74]  is  suitable  for  preparation  for  working  with  complex 
statistics  because  he  approaches  the  subject  using  vector  space  and  invariance 
methods.  Other  than  Miller’s  books,  Eaton’s  book  says  more  about  statis¬ 
tics  of  complex  variables  than  any  other  multivariate  text  I  have  found.  He 
includes  some  discussion  on  complex  statistics  and  the  relationship  to  statis¬ 
tics  of  real  variables.  Eaton  is  a  nice  repository  of  clever  insights  that  make 
derivations  much  easier.  For  example,  he  imposes  the  condition  T  =  to 
take  advantage  of  the  Hermitian  symmetry  of  the  covariance  matrix  to  gener¬ 
ate  a  change  of  variables  of  the  standard  complex  normal  distribution.  This  is 
used  to  obtain  a  chi-square  distribution  which  becomes  the  seed  for  growing 
the  Wishart  distribution.  This  is  a  worthy  book  to  study  after  reading  Broida 
and  Williamson. 

Another  nice  text  on  multivariate  analysis  is  the  one  by  Arnold  [31].  This 
book  is  an  important  contribution  to  the  literature.  It  is  remarkable  for 
its  clear  development  of  properties  of  multivariate  distributions  and  testing. 
Arnold  makes  it  natural  to  think  of  statistics  from  a  multivariate  point  of 
view,  and  to  view  univariate  statistics  as  special  cases.  He  applies  group  the¬ 
ory  sparingly,  but  does  so  where  it  is  clearly  advantageous.  Arnold’s  proof  of 
the  real  Wishart  distribution  density  function  by  induction  is  a  contribution 
to  the  conceptual  development.  His  presentation  of  the  real  matrix  normal 
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distribution  is  also  important. 

A. 2. 3  Abstractions 

Group  invariance  in  statistics  is  a  tc  pic  in  statistics  that  has  generated  much 
attention  by  leading  researchers  in  statistics,  and  is  slowly  making  its  way  into 
the  training  of  graduate  students  in  theoretical  statistics.  I  know  of  no  simple 
text  that  introduces  the  concepts  of  group  invariance,  yet  this  subject  is  at  the 
core  of  knowledge  needed  to  make  progress  on  the  order  estimation  problem. 
One  monograph  that  has  proven  useful  is  the  1989  work  by  Eaton  [75].  It 
begins  with  topological  groups  and  discusses  Haar  measure  by  page  6.  The 
second  chapter  on  group  actions  covers  very  rapidly  the  material  covered  by 
Vilenkin.  He  covers  the  very  important  topic  of  maximal  invariants. 

A  text  that  is  referenced  by  most  authors  at  some  point  in  the  develop¬ 
ment  of  theory  regarding  zonal  polynomials  is  the  book  by  Littlewood  [167]. 
This  book  is  not  one  that  can  be  rushed  through,  but  rather  must  be  worked 
through.  Be  prepared  for  lots  of  subscripts  and  tensor  style  notation.  The 
reader  should  also  have  a  basic  understanding  of  abstract  algebra  and  group 
theory.  Mastery  of  this  text  will  build  a  background  not  available  from  other 
sources  which  is  necessary  to  understand  current  literature.  Of  particular 
interest,  he  treats  groups  of  unitary  matrices. 
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A. 3  A  Real  Complex  Treatment 

It  may  seem  unreasonable  to  the  educated  reader  why  I  have  bothered  to 
prove  some  apparently  obvious  theorems  in  matrix  algebra.  The  answer  is 
that  not  all  properties  taxen  for  granted  in  the  real  variables  case  carry  over 
to  complex  matrices.  Lack  of  attention  to  such  issues  has  led  some  very  well 
respected  researchers  to  write  down  erroneous  results  by  inspection  based  on 
forms  known  from  the  real  variables  case. 

A  clue  that  a  result  might  need  to  be  reproved  (in  both  senses  of  the  word) 
is  when  it  requires  symmetry  or  uses  a  transpose.  Symmetry  and  Hermitian 
symmetry  can  both  apply  to  complex  matrices,  yet  they  impose  different  prop¬ 
erties.  Symmetry  has  group  theoretic  properties  which  were  used  by  earlier 
workers  in  the  development  of  zonal  polynomials  for  the  application  to  the  real 
Wishart  matrix.  In  the  development  of  linear  algebra  for  real  variables,  the 
properties  of  symmetry  and  adjointness  often  go  together  and  thus  are  often 
not  distinguished  when  a  proof  requiring  a  property  is  done.  They  are  usually 
treated  synonymously.  In  C”,  you  can  not  afford  that  luxury. 

In  the  context  of  linear  operators,  the  function  of  the  complex  vectors  x 
and  y  defined  by  <  x,  y  >=  x^y  defines  an  inner  product  in  the  n-dimensional 
complex  space  C",  whereas  the  function  defined  by  (x,j/)  =  x^y  does  not.  In 
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both  cases  you  can  produce  an  orthonormal  set  of  vectors  using  an  appropriate 
different  version  of  the  Gram-Schmidt  process,  but  the  results  will  in  general  be 
different.  I  have  defined  the  inner  product  to  be  linear  in  the  second  argument 
rather  than  according  to  the  mathematician’s  preference  for  assigning  that 
property  to  the  first  argument.  This  was  done  to  make  use  of  the  Hermitian 
transpose  notation  which  carries  with  it  natural  meanings  within  the  context 
of  acoustics,  engineering,  and  physics. 

The  space  C"  is  not  the  same  as  R^".  The  structure  imposed  by  the 
multiplication  operator  for  complex  numbers  changes  the  nature  of  the  space. 
This  appears  to  not  be  widely  understood,  and  it  is  very  important  to  this 
thesis.  Because  of  this,  a  few  examples  will  be  given  to  illustrate  the  problems 
involved.  Smirnov  [249]  provides  the  following  example. 

The  vectors  «  =  (1  +  i,2i)  and  u  =  (1, 1  +  i)  in  are  linearly  dependent 
over  the  field  of  complex  numbers  C,  but  are  linearly  independent  over  the 
field  of  real  numbers  R.  In  the  complex  field  C,  we  have 

u  —  u  =  u;  =  (i,  — 1  +  z) 


This  implies  that 

—iw  =  (l,i  +  1)  =  (1,1  +  0  =  V 

We  observe  that 


—i{u  —  v)  —  —  i(i,  —  1  +  i)  =  (1,1  +  i)  =  V 
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Therefore 

—iu  +  iv  —  V  =  —iu  —  (1  —  i)t;  =  iu  +  (1  —  i)v  =  0 

which  proves  that  u  and  v  are  linearly  dependent.  Now  consider  the  same  case, 
but  in  the  field  of  real  numbers  R.  We  vectorize  u  and  v,  by  putting  the  real 
parts  in  the  first  two  elements  and  the  imaginary  parts  in  the  next  two  elements 
to  obtain  u  =  (1,0, 1,2)  and  v  =  (1,1,0, 1).  Then  u  —  v  =  (0,— 1,1,1)  =  w. 
Then  —w  =  (0, 1,  —1,  —1)  ^  v.  We  see  that  u  and  v  are  linearly  independent 
when  considered  in  R.  To  multiply  by  i  in  R  where  x  =  a  +  ib  =  (a,  6), 
you  must  compute  ix  =  ia  —  b  =  (— 6,a).  This  implies  iw  =  (-1,-1, 0,-1) 
or  ~iw  =  (1,1,0, 1)  =  V.  When  you  are  restricted  to  R,  the  representation 
of  C”  in  R  is  not  merely  R^".  You  have  to  modify  the  definition  of  scalar 
multiplication  to  allow  a  corollary  to  i  = 

We  can  represent  complex  numbers  in  matrix  form,  but  not  every  choice 

will  do.  There  is  even  a  problem  with  multiplication  by  scalar  complex  num- 

/ 

1  0 

bers  to  be  considered.  Suppose  we  let  u  = 

[l  2 

Then 
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This  is  not  even  close.  Suppose  instead  that  we  try 


Then 


(  ^ 

which  is  what  we  want.  We  see  that  —i  is  represented  by  the  matrix 

Matrix  multiplication  for  the  form  —iw  is  not  defined  because  the  matrices  —i 
and  w  are  not  conformable.  That  is  trying  to  multiply  a  2  x  2  matrix  by  a 


4x2  matrix.  Instead,  we  must  compute 


This  is  what  we  want. 


The  understanding  that  there  are  problems  with  differentiation  when  work¬ 
ing  with  quadratic  forms  in  complex  variables  is  not  widespread.  A  common 
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error  is  to  apply  the  method  of  Lagrange  multipliers  via  using  derivatives  to 
solve  optimization  problems.  For  example,  in  an  otherwise  very  nice  text,  one 
author  attempted  to  solve  the  problem 

minio^^^tn  subject  to  c^w  =  f 

by  Lagrange  multipliers.  The  derivative^u;^$^u;  does  not  exist  except  in  the 
case  that  $  is  a  diagonal  matrix.  When  $  is  diagonal,  the  derivative  exists 
only  at  to  =  0,  at  which  point  it  is  zero. 


A. 4  The  Rest  of  the  Story 

In  this  appendix,  I  have  attempted  to  justify  the  documentation  of  the  devel¬ 
opment  of  topics  in  statistics,  matrix  algebra  of  complex  variables,  and  group 
representation  theory  which  will  be  presented  in  following  appendices.  The 
use  of  complex  variables  is  perhaps  “too  natural”  in  that  we  presume  we  know 
how  to  properly  make  the  transition  based  on  our  experience  with  univariate 
complex  variables  and  with  matrix  algebra  of  real  variables.  This  is  a  false  oa¬ 
sis.  To  help  ease  the  transition,  some  references  which  have  been  very  helpful 
to  me  are  recommended  to  you  to  help  shorten  your  transition  period.  Per¬ 
haps  these  may  also  provide  some  enjoyment  to  you  as  well.  The  structure  and 
repetition  of  this  material  can  be  appreciated  much  in  the  manner  of  a  poem. 
It  takes  patience,  knowledge  of  the  language,  appreciation  of  the  cultural  con- 
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text,  some  understanding  of  the  object  of  study,  and  a  good  environment  for 
reflection  and  introspection. 


Appendix  B 
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MATRIX  DIFFERENTIAL  OPERATORS 

B.l  Complex  Derivatives 

A  major  difference  between  working  with  real  variables  and  working  with  com¬ 
plex  variables  is  the  extra  care  required  to  ensure  that  the  desired  derivatives 
exists.  This  has  a  major  impact  on  the  allowable  approaches  used  to  solve  op¬ 
timization  problems.  The  theory  of  complex  differentiation  and  the  Cauchy- 
Riemann  equations  are  part  of  any  good  course  on  complex  variables.  The 
purpose  of  this  section  is  to  raise  a  caution  flag  that  some  often  used  relation¬ 
ships  in  the  case  of  real  variables  do  not  work  for  complex  variables.  Frequent 
errors  in  the  adaptive  beamforming  literature  has  demonstrated  that  discus¬ 
sion  of  this  topic  is  necessary.  We  begin  by  considering  some  simple  functions 
whose  derivatives  do  not  exist. 

The  derivative  of  a  scalar  function  is  defined  by  the  following. 

-7-/(2)  =  hm 
dz  A  2 

The  derivative  does  not  exist  if  your  answer  depends  on  the  path  though  the 
complex  plane  taken  as  A2  — >  0. 
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B.1.1  Computation  and  Cauchy- Riemann  Equations 


Recall  from  the  development  of  the  Cauchy- Riemann  equations  that  when  the 
derivative  exists,  then  there  are  two  ways  to  compute  it.  These  are  given  by 
Wunsch  (pp.  52-55)  [294].  Let  z  =  x  +  iy,  u{z)  =  Re{/(2)},  and  v{z)  = 
Im{/(2)}.  Then 


(  .du  dv\ 
~  V  ^  dy) 

1 

II 

■^0 

^0)  yo 

These  generate  the  Cauchy-Riemann  equations 


du  dv  dv 

dx  dy  dx 


du 

dy 


(B.l) 


Satisfying  these  equations  is  a  necessary,  but  not  sufficient,  condition  for 


the  existence  of  the  derivative  at  zq.  These  conditions  are  sufficient  when 
u, u,  1^,  1^,  1^,  and  are  all  continuous  functions  in  the  neighborhood  of 
Zq.  See  Wunsch  [294]  for  an  excellent  tutorial  on  this  subject. 

By  substituting  the  Cauchy-Riemann  equations  back  into  f'(z),  we  find 


= 
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when  £f{z)  exists.  Note  that  (£  +  f{z)  =  £f{z)  -  fj{z)  =  0. 

Caution.  When  you  face  a  situation  in  which  a  derivative  does  not  exist, 
you  can  not  assume  the  derivative  to  merely  be  zero  and  ignore  it.  When  the 
derivative  does  not  exist,  that  means  that  any  subsequent  work  requiring  that 
derivative  does  not  apply  and  cannot  legally  be  used.  This  means  you  need  to 
determine  if  there  is  another  way  to  solve  your  problem. 

B.1.2  Derivative  of  the  Conjugate  of  a  Variable 

The  following  material  appears  very  elementary,  yet  some  very  respected  au¬ 
thors  in  adaptive  beamforming  and  complex  statistics  have  not  understood  it. 
Therefore,  inclusion  of  this  material  is  mandatory.  It  is  this  particular  deriva¬ 
tive  which  is  the  root  of  most  mistakes  in  the  literature,  and  is  the  foundation 
for  the  lack  of  existence  of  many  other  derivatives  discussed  in  this  section. 

Let  z  =  X  i-iy  and  z*  =  x  —  iy.  Then  -^z*  does  not  exist,  anywhere.  This 
proof  comes  from  Spiegel  (p.  71)  [253].  By  definition, 

d  .  ,.  (2 -H  Az)*  -  (2)*  (a; -t- ii/ -1- Ax -t- zAy)*  -  (x -I- it/)* 

—z  =  hm  - T - =  lim  - T — — T-T - 

dz  Az-*o  Az  Ax— 0  Ax  -|-  tAy 

Ay— 0 

X  —  iy  -|-  Ax  —  iAy  —  x  -|-  iy  Ax  —  iAy 

=  hm  - T - 7-: -  =  hm  — - r-;— 

Ax— 0  Ax  -b  tAy  Ax— 0  Ax  -t- 1  Ay 

Ay— 0  Ay— 0 

Suppose  Ax  =  0.  Then  =  “1-  Now  suppose  Ay  =  0.  Then 

^z*  =  ^im  ^  =  1-  Thus  the  answer  you  get  depends  on  the  path  you  take 
to  get  to  the  point  which  you  evaluate  the  derivative.  Here  it  is  seen  that  there 
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is  no  point  zo  at  which  -£2*  exists.  In  particular,  £z*  does  not  even  exist  at 

zo  =  0. 

Note  that  this  implies  £  Re  (2)  and  £  Im(2)  do  not  exist  because  Re  (2)  = 
1(2  +  2*)  and  Im(2)  =  ^(2  —  2*).  Since  the  derivative  is  a  linear  operator  and 
£z*  does  not  exist,  then  these  other  derivatives  also  do  not  exist. 


B.1.3  Derivative  of  the  Magnitude 


This  was  motivated  by  the  discussion  by  Wunsch  [294].  Although  I  have 
not  looked  for  a  statement  of  this  result  elsewhere,  it  is  so  elementary  that  I 
presume  this  is  not  an  original  result. 

Let  z  =  X  +  iy  and  thus  |2|  =  Then  £  \z\  does  not  exist 

anywhere. 

d..  I2  +  A2I-I2I  \x  +  iy  +  Ax  +  iAy\-\x +  iy\ 

—  2  =  hm  ' - — —  =  hm  - - — -r-r - 

dz  Az— 0  Az  Ax—o  Ax  +  lAy 

A»-0 

..  ((x  +  Ax)2  +  iy  +  Ayfy/^  -  (x^  +  y^f!^ 

—  km  - 7. - ^ - 

Ax— 0  Ax  +  I  Ay 

Ajz-O 


=  lim 

Ax— 0 
Ay— 0 


(x^  +  +  2(xAx  +  yAy)  +  (Ax)^  +  (Ay)^)^/^  -  (x^  +  y'^Y^^ 


Ax  +  iAy 


Suppose  Ax  =  0.  Then 


I  .  1-  +  y  +  2y(Ay)  +  {AyYY^  -  +  v  Y^ 

—  2  =  hm - 7-T - 

dz  Ay-0  I  Ay 


At  this  point,  we  note  that 


y/a  +  b  —  y/a  —  J(y/a  +  b—  y/a^  =  y2a  +  b  —  2\Ja{a  +  b) 


299 


which  implies 


—  I^l  =  lim 
dz  '  Ay-O 


2(x^  +  y^)  2y 
-{Ayy  ^ -Ay  \ 


(x^  +  y^)ix^  +  y^  +  2y(Ay)  -f  (Ay)^) 


which  is  unbounded,  and  tends  to  too  if  y  >  0.  This  limit  does  not  exist.  This 
is  sufficient  to  show  ^  \z\  does  not  exist.  Just  for  completeness’  sake,  suppose 
Ay  =  0.  Then 

d,,  (x^  +  2/^  +  2x(Ax)  + 

—  \z\  =  lim  - T - 

dz  Ai-»o  Ax 

This  is  unbounded,  and  ten.  "o  oo  if  x  >  0.  Suppose  ^  =  0.  If  Ax  =  0  then 
d 


,  \z\  =  lim 
dz  Ay-o 


-1-2 


1 


(Ay)2j 


too 


Suppose  Ay  =  0.  Then 


—  \z\  =  lim  (1)  =  1 
dz  Ax-O'  ' 


Thus  ^1^1  does  not  exist  even  at  z  =  0. 


B.1.4  Derivative  of  the  Magnitude  Squared 

This  is  essentially  the  work  on  pp.  56-57  of  Wunsch  [294]. 

Here,  we  find  that  ^  \zf  exists  only  at  z  =  0,  at  which  point  the  derivative 
is  zero. 

d  ,  ,2  ,.  |z -I- Az|^  -  [zl^  |(x -f  Ax) -H  t(j/ -I- Ay)|^  -  |x  4- 

-7-  z  =  lim  - - -  =  lim  - T - rr - 

dz  Az-*o  Az  Ax— 0  Ax  -|-  lAy 

A»— 0 

x^  -I-  2x(Ax)  +  {Axf  +  y^  +  2j/( Ay)  -|-  (Ay)^  -  x^  -  y* 

=  lim  - T - TT - 

Ax— 0  Ax  -f- 1  Ay 

Ay- 0 
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2x{Ax)  +  2y{Ay)  +  {Axy  +  {Ayf 

—  - 7. - ^ - 

Ax-^0  Ax  +  I  Ay 

Av-*0 

Suppose  z  ^  0  and  Ax  =  0.  Then 


dz  Aj/->o  I  Ay  i 


Suppose  z  ^  0  and  Ay  =  0.  Then 


d  ,  ,2  2x(Ax)  +  (Ax)^ 

—  \zr  =  hm  — i - - 

dz  Ax-^o  Ax 


=  2x 


Thus  ^  |z(^  does  not  exist  for  z  ^  0.  Suppose  z  =  0.  Then 


d  .  ,2  (Ax)2  +  (A!,)=  „ 

—  \z\  =  hm  — T —  =  hm  — r - rr - =  0 

dz  Az-»o  Ax  Ax-*o  Ax  +  ^Ay 

Ay-0 


Thus  exists  only  at  x  =  0,  at  which  point  the  derivative  is  zero. 


B.1.5  Derivative  with  Respect  to  a  Vector 

The  familiar  rules  for  differentiating  with  respect  to  a  vector  or  matrix  holds  as 
long  as  the  function  does  not  contain  the  conjugate  of  the  variable  you  are  dif¬ 
ferentiating  with  respect  to.  Several  frequently  used  derivatives  with  respect 
to  a  vector  are  presented  below.  Real-variables  versions  of  these  have  long 
been  established.  Complex  versions  have  often  been  used,  with  correspond¬ 
ing  evidence  in  the  literature  by  well-respected  authors  that  these  results  are 
well  unknown.  Systematic  development  of  these  results  is  an  easy  and  major 
contribution  of  this  thesis  which  could  be  done  by  any  senior  in  engineering. 
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Let  z  be  a  complex  vector.  Then 


def 


(B.2) 


For  a  €  C”,  Let  f{z)  =  aTz.  Then 


d  'P  d 

-a  .  =  = 


=  a  —  —z  a 
dz 


(B.3) 


For  a  €  C",  Let  f(z)  =  a^z.  Then 


^  H  ^  •  • 

—a  z  =  —  >  ttiZi  =  a 

dz 


{BA) 


Note  that  does  not  exist. 


For  A  6  and  f{z)  =  z^Az.  Then 


^z^Az  =  {zu- 
dz  dz 


,~'n)  : 


where  Oj  G  C".  When  expanded,  this  is 
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To  visualize  this  easier,  I  will  write  this  out  ad  nauseam.  It  is 


/  \ 

+  ^12^1^2  +  •  •  •  +  0-\nZ\Zn 

+^21^2^1  +  022^2  "h - d"  ®2n^2^n 

d 

^  +0312321  +  az2ZzZ2  +  •  •  •  +  dsnZsZn 

+  ••• 

^  +a„l2„2i  +  a„22„22  + - h  annzl  ^ 

where  the  term  in  the  brackets  is  a  scalar  and  I  have  arranged  it  to  give  insight 
how  the  terms  arose.  Taking  the  derivatives,  we  get  the  column  vector 


\ 

2aii2i  +  (012  +  021)^2  +  (fll3  +  031)23  +  •  •  •  +  (oi„  +  0„i)2n 

(012  +  021)21  +  20222^2  +  (^23  +  032)23  - +  (02n  +  0„2)2n 

(oi3  +  031)21  +  (023  +  032)22  +  2O3323  +  •  •  •  +  (03„  +  arxz)Zn 

^  (Oln  +  0„i)2i  +  (02n  +  0„2)22  +  (03„  +  a„3)2„  +  •  •  •  +  2o„„2„  ^ 


Let 


A=  (o^o^•••,o”) 


02 


o„ 


/ 
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Then 

(ox  +  {a^)‘^)z 
(02  +  («^)^)^ 

(a„  +  (a^y)z 

If  A  =  then  -^z^Az  =  2Az,  which  is  the  familiar  result  often  cited  in 
the  real  variables  case.  The  only  requirement  here  is  that  A  must  be  square. 
Note  that  the  matrix  in  the  answer  is  always  symmetric  even  though  A  is  not 
required  to  be  symmetric. 

Some  simplicity  is  achieved  if  A  =  A^  because  the  (a‘)  are  merely  trans¬ 
posed,  and  not  conjugated.  Thus  A  =  A^  implies 

A  +  A'^  ^A  +  A*  =  2}le(A) 

Thus,  if  i4  =  A^,  then 

-^z^Az  =  2[Re(i4)]2  (B.6) 

dz 

Now  consider  the  case  of  -^z^Az.  This  is  the  specific  derivative  that  is 
most  frequently  abused  in  the  literature  when  approaching  a  linear  optimiza- 

n  n 

tion  problem  in  adaptive  beamforming.  Expanding,  we  get  ;r  IZ 

t=i  j=i 

However,  we  recall  that  -^z^OijZj  does  not  exist  when  i  ^  j.  Therefore,  -^z^  Az 
does  not  exist  when  A  is  any  matrix  except  a  diagonal  matrix.  When  y4  is  a  di¬ 
agonal  matrix,  the  derivative  exists  only  when  z  is  the  zero  vector.  When  this 
is  true,  the  derivative  is  zero.  When  faced  with  the  desire  to  try  this  derivative, 
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one  should  instead  attempt  other  techniques  like  completion  of  squares  or  a 
projection  based  technique.  Note  that  the  method  of  Lagrange  multipliers  can 
be  pursued  without  taking  derivatives  even  though  the  usual  approach  in  the 
real  variables  case  almost  always  uses  a  derivative  in  the  solution. 
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Then 


A.  (JlA 

dz  \dz^) 


=  d^  = 


(P 

P 

d.? 

dz\  dz2 

dZi dzn 

d2 

P 

d? 

z 

dz2^Zn 

d^ 

p 

dzndzi 

dz„dz2 

which  is  the  Hessian  matrix  operator  on  /,  which  I  will  call  7i.  Note  that 
=  yF .  If  all  of  the  derivatives  are  continuous,  then  H  =  since  the 
continuity  of  derivatives  allows  the  order  of  the  derivatives  to  be  exchanged. 
Recall  that 

(ai  +  (a*)^)2 

d  ^  .  (a2  +  {a^Y)z 


-^z^Az  = 
dz 


=  {A^A^  )z 


Then 


(a„  +  (a”)^)2 


Again,  A z  does  not  exist.  Similarly,  Az  and  j^^z^Az  do  not 

exist.  If  A  =  A^,  then 

CP  T 

A  ^  -  ^  - 


dz^d: 


z‘  Az  =  2  Re(  A) 


(B.8) 


B.2  Derivative  with  Respect  to  a  Matrix 
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Let  Z  be  a  complex  matrix  with  elements  (Zij).  Let  f{Z)  be  a  scalar  valued 
function  of  Z.  The  function  /  may  take  on  complex  values.  When  /  is  a 
differentiable  complex  function,  then  define 


def 


a 

a 

a 

dZu 

dZt2 

a 

a 

a 

az2\ 

dZ22 

az2„ 

a 

d 

a 

aZmi 

9Zmn 

(B.9) 


Note  that  when  this  exists,  then 


When  Z  =  X  +  iK,  then 


=  ^nz)  = 


a 

a 

a 

aXii 

dXi2 

dXi„ 

a 

a 

a 

ax2i 

ax22 

dX2n 

a 

a 

a 

aXmi 

ax„2 

Suppose  f{Z)  is  itself  now  a  matrix, 

(  fiiiZ)  fn{Z)  •••  /„(Z) 


fAZ)  fMZ)  •••  f2,{Z) 


(B.IO) 


UZ)  f,,{Z)  •••  /„(Z) 
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Then  is  an  {mp  x  nq)  complex  matrix  where  each  f{Z)  is  a  (p  x  q) 

matrix.  The  desired  form  is  the  “right  hand”  direct  product  of  ^  and  f{Z). 
The  matrix  -^fiZ)  has  the  form 


(B.ll) 


where 


\  dZ„,r 

^  Mjl  . 

9/1,  ^ 

dZjk 

df  _ 

. 

dZjk 

• 

*  ♦ 

3Ui 

.  .  I 

K 

) 

(B.12) 


Caution.  This  is  much  different  than  doing  a  matrix  multiplication  of  the 
operator  matrix  by  the  function  matrix.  The  more  correct  analogy  is  the 

direct  product  (A)®  (/(Z)) .  Even  with  this  interpretation  there  is  danger  of 

/  \  f  \ 

o-w  ai2  1  I  ^11  b\2 

ambiguity.  For  example,  let  =  I  and  B  =  \  .  A®  B 


021  022 


i>21  62 


has  been  interpreted  differently  by  various  authors.  The  interpretation  that 
yields  the  form  above  is 

/  \ 

oiiB  ai2B 

A®rB=  (B.13) 

.  021 B  0'22B 


(B.13) 


This  is  not  the  same  as 


A®lB  = 


Ab\i  Ab\2 
i462i  Ab22 


(B.14) 
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which  is  a  “left  hand”  direct  product.  Because  of  the  ambiguity,  it  is  good 
practice  to  write  out  a  definition  for  your  readers.  If  you  choose  to  use  the  left 
hand  direct  product  for  0i  (f(Z)),  make  sure  your  further  derivations 
are  consistent. 

A  result  for  real  variables  that  carries  over  to  complex  variables  is  the 
derivative  of  a  determinant  with  respect  to  a  matrix  element.  Let  A  G 
be  a  complex  matrix  with  elements  aij  and  minors  Xij  obtained  from  A  by 
deleting  row  i  and  column  j  from  A.  Computing  det(A)  by  cofactor  expansion 
down  column  j  gives  us  the  following. 


det(A)  =  det(A'o)  (B.15) 

«=i 

Note  that  a^j  appears  only  once  in  this  expansion,  when  i  =  k.  U  we  take 
the  derivative  of  det(A)  with  respect  to  Okj,  we  get  just  one  term  if  all  the 
Oij  are  algebraically  independent.  Thus  det(A)  =  (  —  1  )’■'■■'  det(X,j).  When 
A  =  A^,  the  f  'ements  oflF  the  major  diagonal  are  algebraically  dependent  such 
that  a,j  =  ttji.  In  this  case. 


(-l)‘+-'det(Xii),  i  =  j 
{-l)*+^2det(A',j),  i^j 


(B.16) 
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For  the  algebraically  independent  case,  ^  det(y4)  = 

/  \ 

(-l)»+idet(Xn)  (-1)^+2  det(Xi2)  •••  (-1)^+"  det(Xi„) 

(-1)2+1  det(X2i)  (-i)2+2det(X22)  •••  (-l)2+"det(X2„) 


^  (-l)"+idet(X„,)  (-l)"+2det(X„2)  (-l)”+"det(X„„)  ^ 


Thus  we  observe  that 


~  det(>l)  =  (adj  Af 


(B.17) 


When  A  i  exists,  then 
d 


dA 


detv4  =  (i4~Mety4)^  =  (det  v4)(A~^)^  =  (deti4)4  ^  (B.18) 


Oil 

O12 

y  Oj2 

O22 

Caution.  Suppose  A  —  A^  is  a  2  x  2  matrix 


~ —  det  A  =  — [011022  —  |oi2|^] 

aoi2  aoi2 


exists  only  at  012  =  0.  Suppose  A  =  A^  is  a  3  x  3  matrix 

/  \ 

>1  = 


.  Then 


Oil 

O12 

0l3 

0^2 

O22 

023 

Ol3 

®23 

O33 

The  determinant  is 


det  A  =  011022033  +  0120^3023  +  012013023 

—  |oi3p  022  “  Oil  |o23p  —  |oi2|^  O33 
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From  this,  we  see  that  for  i  ^  j  that  det  A  does  not  exist  because 
does  not  exist  anywhere. 

Let  us  examine  this  one  step  further.  Consider  T  =  and  J5,  both  in 
MiiC),  and  ^  det(5r). 


(  \ 

(  \ 

Oil  bi2 

fll  fl2 

Ollfll  +  0X2^12  ®11^12  +  012^22 

^  621  f>22  j 

^  fl2  ^22  j 

^  f>2lfll  +  ^>22^12  ^2\tl2  +  ^>22^22  y 

and 

det(Br)  =  +  6i2^12)(^21^12  +  ^2^22)  ~  (^21^11  +  ^2^12)(6ll^l2  +  ^12^22) 


From  this  we  see  that 


(621^12  +  ^>22^22)  +  (^11^11  +  ^12^12)  ^21 

+  ^12^22)  ~  (^>21^11  +  ^>22^12)  ^11 

which  does  not  exist  because  g^^i2  not  exist,  anywhere,  even  at  zero. 
In  fact,  for  T  =  and  B,  both  in  M„(C),  only  g^det(5T’)  exists.  Thus, 
^det(5T)  does  not  exist.  Similarly,  ^det(/4  —  iBT)  does  not  exist  when 
T  =  T^.  This  specific  example  will  become  very  important  in  a  moment  (in 
both  senses  of  the  phrase). 

The  interested  reader  is  encouraged  to  read  papers  by  Tracy  and  Dwyer 
[267]  and  Dwyer  [72]  for  the  case  of  real  variables. 

One  of  the  implications  is  that  when  dealing  with  complex  variables,  the 


^det(Br)  = 


d 


dt 


12 


+  ^>12^12) 


dt 


12 


(621^11  +  ^>22^12) 


{blit 


u 


usual  maximization  or  minimization  problems  cannot,  in  general,  be  solved  by 
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taking  derivatives.  This  includes  the  usual  implementation  of  the  method  of 
Lagrange  multipliers  for  constrained  optimization  problems.  (For  a  successful 
application  of  the  method  of  Lagrange  multipliers  applied  to  complex  variables 
for  adaptive  sonar  beamforming,  see  the  paper  by  Cox  [61].) 


B.2.1  Linear  Transformation  of  a  Vector 

Let  z  =  Tx  where  2,x  €  C"  and  T  €  C”^”.  Let  /  be  a  differentiable  complex 
function.  Then 

dx  dz 

This  is  a  complexification  of  an  unnumbered  lemma  found  in  Muirhead  (p. 
240)  [187],  which  is  stated  without  proof. 

Proof.  By  definition. 


Also, 


dx 


(  ^\ 

dxi 


di 

\dt/ 


dxi  ^  dxi  dzj 


Thus 


dx 


dxi 


M. 

\  ) 


dxi 


dzi 
\  BXn 


dZn 

dx\ 


dzn 

dx„  ) 


I  ej_\ 

dz\ 


\  ) 


def 

=  dz 
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Recall  that 


Tin  \  I  Xi 


—  Tx  =  z 


\  j  yTnl  •  •  •  Tnn  /  \  / 

Then  Zj  =  J2  ^ji^i  ^  From  this  we  see  P  =  T^.  Therefore 


?£  =  T-r^ 

dx  dz 


B.2.2  Derivative  of  a  Matrix  with  Respect  to  Itself 

Lemma  1  Let 


En  = 


where  n^j  is  the  elementary  vector  of  size  n  consisting  of  all  zeros  except  for  a 
1  in  the  position.  Let  X  €  Then 


Proof. 


_  r  pT 

dx  - 


1  dX 

dX 

dX 

dXii 

dXu 

dX 

dX 

dX 

dXit 

dX22 

dX2m 

dX 

dX 

dX 

k  dX„i 

dX„2 

dXnm 
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Consider 


dX 


dX, 


•j 


Mu 

dXij 


dXi 

dXi 


dX^ 

\  dXi, 


dXu 

dX„ 


dX,j 

dX,j 


dX, 


Jil. 


dXi 


Mlm. 

dX„ 


dXtm 

dX.j 


dXnm 
dX.j  j 


—  n^:  m^j 


This  is  an  n  X  m  matrix  of  all  zeros  except  a  1  in  position  Thus 


/ 


dX 

dX 


n^l  m^i  n^l  m^2  "  ’  * 


n^2  m®i  n^2  m^2  '  "  *  "^2  m^m 


n^n  m^m 


\ 


y  n^n  m'-i  n'-n  m^2 

This  is  an  x  sparse  matrix  with  only  nm  non-zero  cells.  Each  nonzero 
cell  contains  a  1.  Caution.  For  n  x  n  square  matrix  X, 

X"  =  XX  =  Xi,nXmj 

0,  i  ^  k,  j  I,  i  ^  j 

Xij,  i  =  k,  j  ^l, 

Xik,  ii^k,  j  =  I,  j 

^  Xkk  d"  Xii,  I  —  k^  j  —  /,  I  ^  j 

2Xkk,  i  =  j  =  k  =  l 
0  i  =z  j  =  k.,  k  ^  I 

0  i  =  j  -  I,  k  ^  I 
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L  -In2xn2 

Thus,  the  usual  differentiation  scheme  for  polynomials  does  not  apply.  I.e., 

dx  r  dX’ 

B.2.3  '^Athans  and  Schweppe  Theorems 

Athans  and  Schweppe  [34]  published  a  technical  report  with  many  matrix 
gradients  for  the  case  of  real  variables,  complete  with  proofs.  These  have  been 
quite  useful  in  this  and  other  works.  This  paper  was  brought  to  my  attention 
by  Ferlez  [82]  in  a  series  of  very  interesting  and  helpful  discussions.  What 
follows  is  a  complexification  of  this  convenient  paper.  I  have  supplied  the 
proofs  and  occasional  related  corollaries. 

Proposition  5  Let  Z  G  Then  ^  tr(Z)  =  /„.  This  is  a  complexification 

of  equation  (1)  in  the  appendix  of  [34]. 


i 
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Proof.  tr(Z)  =  J2  Zii- 

i=l 


( 


d 

dZjk 


tr(Z)  =  { 


0, 

1, 


i  7^  ^ 
j  =  k 


where  the  {Zjj}  are  all  algebraically  independent.  In  particular,  Z„  ^  Zjj  for 
any  i  ^  j. 


Proposition  6  Let  A  G  and  Z  G  0”**^”.  Then  ~  tr(y4Z)  =  .  This 

is  a  complexification  of  equation  (2)  in  the  appendix  of  [34]- 

n  m 

Proof.  By  lemma  26,  iv{AZ)  =  S  Z)  AijZji.  When  Zij  ^  for  any 

t=i  j=i 

(i  zjk  k,  j  ^  /),  then  tr(AZ)  =  Aik-  Thus  ^  tr(AZ)  =  A'^. 

Proposition  7  Let  A  G  Z  G  T/ien  ^  tr(yl*Z)  =  A^. 

Proof.  tr(/l*Z)  =  E  E  ^  tr(/l*Z)  =  A^^.  Thus  ^  tr(/PZ)  =  4^. 

fc=i /=i 

Proposition  8  Let  A"  =  Ae  C"’'"  and  Z  G  C"’'".  Then  ^  tr(A*Z)  =  A. 

Proof.  By  corollary  1,  ^  tr(i4*Z)  =  A^  =  A. 

Proposition  9  Let  A  G  C”’'”  and  Z^  =  Z  G  C"’'".  Then  ^  tr(y4Z)  does 
not  exist. 


Proof.  ^  tr(/4Z)  =  ^  tr(AZ^).  tr(AZ^)  includes  a  term  involving  Z*j. 

not  exist.  Therefore  ^tr(AZ^)  does  not  exist.  Note  that  this 
does  not  depend  on  the  structure  of  A. 
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Proposition  10  Let  A  €  Z  €  Then  tT{AZ^)  =  A,  and  ^  tr(>l^Z) 

A.  This  is  a  complexification  of  equation  (3)  in  the  appendix  of  [34j- 

Proof.  By  lemma  28, 

n  m 

tr(>lZ^)  =  iv{A^Z)  = 

k=l l=i 

Thus  0^  ir{AZ^)  =  Aij  when  all  the  Z,j  are  algebraically  independent.  The 
full  matrix  is  then  tr(y4Z^)  =  A. 

Proposition  11  Let  A  e  Z  e  Then  ^  tr(A"Z)  =  A\ 

Proof.  By  lemma  29,  tr(i4^Z)  =  IZ  A*j^Zik.  tr(.4^Z)  =  Ajj.  The 
full  matrix  is  ^  tr(A^Z)  =  A*. 

Proposition  12  Let  A  €  Z  €  Then  ^  tr(i4Z^)  does  not  exist. 

Proof.  tr(AZ^)  includes  a  term  involving  Z^j.  does  not  exist. 

Therefore  ^  tr(i4Z^)  does  not  exist. 

Proposition  13  Let  A  €  Z  €  B  €  Then  ^  tr(ylZB)  = 

A^ .  This  is  a  complexification  of  equation  (4)  in  the  appendix  of  [34]- 

Proof.  By  lemma  31, 

m  p  q 

iT{AZB)  =  '£'£'£^;>^,'-Bu 

.=1  j=l  k=l 
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^  tv{AZB)  =  Y,  AirB^i  =  {Arr,  •  •  • ,  A^r)  :  def  AJ{B‘)'^ 


Then 


tT{AZB)  = 


Aj{B^f  AfiBY  •••  A[(B^f 
A^(B^)^  Al(B^)T  •••  AliB'^Y 


Al{B^r  AUB^f  •••  AliB^f 


(BY  {BY 


T  =  A^B'^ 


where  the  elements  of  Z  are  algebraically  independent.  In  particular,  this  does 
not  exist  if  Zij  =  Z^  for  any  (i  j  ^  1). 

Proposition  14  Let  A  G  Z  €  B  e  Then  ^  YAZ'^B)  = 

BA.  This  is  a  complexification  of  equation  (5)  in  the  appendix  of  [34]- 


Proof.  By  lemma  32, 


m  p  g 


tt(AZ^B)  =  ’£^^A.,Z„Bu 

i=l  j=l  k=l 


From  this,  we  know 


tr( AZ^B)  =  YAi.Bri  =  AJ(BY  =  B^A, 
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since  this  is  a  scalar.  A,  is  a  column  vector,  and  is  a  row  vector.  Then 


^  tv{AZ^B)  = 

BM,  .BMa  •••  BMp 
J3Mi  fiMa  •••  BMp 

BMi  BMa  •••  BMp 


where  all  elements  of  Z  are  algebraically  independent. 


Proposition  15  v4  €  Z  e  B  6  T/ien  ^tr(/lZ"B) 

does  not  exist. 


Proposition  16  Let  A  €  Z  e  Then  ^tr(/lZ)  =  A.  This  is 

a  complexification  of  equation  (6)  in  the  appendix  of  [34]- 
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Proposition  17  Let  A  €  Then  .  This 

is  a  complexification  of  equation  (1)  in  the  appendix  of  [34]- 

Proof.  By  lemma  27,  tr(^Z^)  =  5^  23  AkZik.  This  implies  tr(>lZ^)  = 

/=1  A:=l 

Aij  which  means  ^  tr(AZ^)  =  A.  This,  in  turn,  implies 

Proposition  18  Let  A,Ze  Then  tr(AZ^)  and  tr(AZ^)  do 

not  exist. 

Proof.  Q^Zij  does  not  exist.  Also,  -^j^Zij  does  not  exist. 

Proposition  19  Let  A  €  Z  e  B  G  C"^p.  Then  ^  tr(AZB)  = 

BA.  This  is  a  complexification  of  equation  (8)  in  the  appendix  of  [34]- 

Proof.  By  proposition  13,  ^  tr(AZB)  =  A^B^,  which  implies 
^  tv{AZB)  =  tr(ylZ5)]’'  =  (A^B^f  =  BA 

Proposition  20  Let  A  €  Z  G  B  G  C’^'p.  Then  ^  tr{AZ^B)  = 

A^B^.  This  is  a  complexification  of  equation  (9)  in  the  appendix  of  [34j- 

Proof.  By  proposition  14,  ^tr(AZ^B)  =  BA.  This  implies 
^  tr(/lZ’'F)f  =  (BAf  = 
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Proposition  21  Let  Z  €  C"^".  Then 

^tr(ZZ)  =  Atr(Z>)  =  2Z^ 

and  g^tr(Z^)  =  2Z.  This  is  a  complexification  of  equation  (10)  in  the  ap¬ 
pendix  of  [34]- 

Proof.  By  lemma  30,  tr(Z^)  =  Y)  Y.  Therefore 

i=i  i=i 

— —  tr(Z^)  =  Zik  +  Zik  =  2Zik 
oZki 

which  implies  ^  tr(Z^)  =  2Z^.  Similarly,  tr(Z^)  =  2Z. 

B.2.4  Derivative  of  Determinants 

This  topic  addresses  the  computations  that  lead  to  the  discovery  that  the 
function  referred  to  as  the  characteristic  function  for  the  complex  Wishart 
distribution  was  not  the  straight-forward  function  I  had  hoped  it  to  be.  The 
function  of  interest  is  $^(T')  =  [det(/p  —  I  attempted  to  compute 

moments  of  the  complex  Wishart  distribution  and  did  not  obtain  some  forms 
which  I  knew  to  be  true  via  using  other  methods.  Nevertheless,  the  following 
forms  may  be  useful  in  another  context.  I  have  not  diligently  searched  the 
literature  for  these  results.  Although  I  supplied  the  following  results,  they  are 
simple  enough  to  have  been  done  by  any  senior  in  engineering  after  exposure 
to  the  simple  results  regarding  differentiation  of  complex  variables. 


321 


Theorem  12  Let  A  6  C"^”,  B  £  and  T  £  Then 

det(A  -  iBT)  =  -i[det(A  -  iBT)][{A  -  iBT)-^BY 


where  Tij  ^  Tj^i- 


Proof.  The  (iky^  element  of  the  matrix  {A  —  iBT)  is  {Aik  —  i  Y1  BiiTik)ik- 

i=i 

The  operator  ^  is  a  matrix  of  operators  -^r^.  Note  that  every  element  of 
column  k  of  {A  — iBT)  contains  Tjk-  We  expand  the  determinant  det{A  —  iBT) 
down  column  k.  Let  Q  =  A  —  iBT  and  let  be  the  minor  of  the  element 
Qpk  obtained  by  removing  row  p  and  column  k  from  Q.  Then 

det(A  -  iBT)  =  Upfc  -  i  BpiTik  det(QP'') 

p=i  \  1=1  )  pk 

The  {jky^  element  of  ^  det(>l  —  iBT)  is 

^det(/l  -  tBT)  =  £{-!)»+* 
oTjk  ^ 

The  full  matrix  is 

^det(/l-iBr) 

'  f;(-l)>'+'det((3>'‘)(-iBrt)  E(-l)''+Met(Q«)(-iBp,)  ■■■ 

p=l  p=l 

t  (-ir^  det(gp')(-iBp2)  t  (-irMet(gp2)(-*Bp2)  •  •  • 

_  p=i  p=i 

\  P=1  p=l 
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^(_l)P+ndet(g»>")(-i5p,) 

P=1 

f:(-l)P+'*det(gP’‘)(-iBp2) 

P=1 


p=\ 


Let  5p  def  {Bpi,Bp2,  ,  Bpm)  be  row  p  of  matrix  B.  Then  ^  det(>l  -  iBT) 


BF  def  [(-ir'  det(C?P'),  •  •  • ,  (-1)'’+”  det(gP")] 
Then  ^  det(>l  -  iBT) 


=  -i  [bJR'  +  +  ■  ■  •  +  B^R"]  =  - 


p=i 


Then 


det(A  -  iBT)  =  -z(5f ,  Bl,  •••,5^)1  :  I  =  -* 


=  -iB^ 


(-l)‘+MetQii  (-l)»+2detQ'2  ...  (_i)i+ndetQi 

(-1)2+1  detQ2‘  (-1)2+2  detg22  ...  (_l)2+ndetQ2 


(-l)”+idetg"i  (-l)"+*detg"2  ...  (_i)n+ndetg^ 

def  -iB'^R  =  C 
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We  recognize  Q  ^  when  Q  ^  exists.  Thus  when  (A  —  iBT)  *  exists, 

then  ^  det(j4  —  iBT) 

=  -i[det(yl  -  iBT)]B'^[{A  -  iBT)-^f  =  -i[det(A  -  iBT)][{A  -  iBTy^B]'^ 

Note  that  if  {A  —  iBT)  does  not  have  an  inverse,  then  ^  det(/l  —  iBT)  =  C 
is  still  valid  and  will  exist  provided  that  each  of  the  partial  derivatives 
exists,  which  they  do.O 

Note  from  the  discussion  on  complex  derivatives  that  if  m  =  n  and  T  =  T^, 
then  ^  det(A—iBT)  does  not  exist.  Regardless,  the  existence  of  the  derivative 
does  not  depend  on  any  lack  of  structure,  or  the  presence  of  structure,  on  A 
or  B. 

Theorem  13  Let  A  €  C"''”,  T  €  B  e  Then 

^  det{A  -  iTB)  =  -i[det(A  -  iTB)][B{A  ~  iTB)-'^^ 
where  Tj  ^  T^i. 

Proof.  This  is  nearly  the  same  as  the  previous  identity.  The  element 
of  .4  —  iTB  =  Q  IS  given  by 

We  expand  across  row  j  is  evaluating  det(4  —  iTB)  to  obtain 
det(A  -  iTB)  =  2  Uil  - 


|det(Q")l(-l)'+' 
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The  {jky^  element  of  ^  det(A  —  iTB)  is  det((5^^)  where 

is  the  minor  of  element  (jl)  of  Q  obtained  by  deleting  row  j  and  column  /. 

The  full  evaluation  is 

^del{A-,TB)  = 

\ 

(-l)i[det(0^')]5i/  (-l)Mdet(g»')]52/  {-mdet{Q^‘)]Bmi 

i-mdetiQ^‘)]Bu  (-I)2[det(g2')]fi2,  •••  i-mdetiQ^‘)]Bmi 

(-l)"[det(^?"')]5i/  (-l)"l<let(g"')]52/  •••  (-l)"[det(g”')]5m/  ^ 

\ 

(-l)Mdet((?»')][^i/.^2;,---,5m/] 

i-indetiQ^%Br,,B2,,---,Brni] 

(-l)”[det(Q”')][5i,,  52f,  •  •  • ,  Bmi]  ^ 

Let  Bi  be  column  /  of  matrix  5,  and 


/=i 


/=i 


(-l)‘+'detC?*' 

(-1)2+Metg2/ 

^  (-l)"+'detg"'  ^ 
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Then 


^  det(A  -  iTB)  =  -i  E  Q^Bl  =  -i{Q\Q\  •  •  • , Q”) 


/=i 


‘  BT  '' 


yBn-r  , 


We  recognize  ^  det  Q  = 


/  \ 

(-1)1+1  det  g“  (-1)1+2  det  gi2  ...  (_i)i+ndetgi" 


(-1)2+1  det  g2i  (-1)2+2  det  Q 


22 


(  —  1)2+"  det  Q 


2n 


^^  (-!)"+!  det g"i  (-l)"+2detg"2  •••  (-1)"+" det Q” 


=  -i[det(g)]g-^5^  =  -i[det(g)](Bg-i)^ 


Therefore 


^  det(A  -  iTB)  =  -i[det(^  -  iTB)][B{A  -  iTB)-^ 


As  with  the  previous  example,  ^  det(g)  can  exist  even  when  det  g  =  0.  In 
that  case,  it  is  evaluated  as  above,  before  using  the  adjoint  form  for  an  inverse 
to  simplify  the  notation. 

Theorem  14  Let  A  €  C"""",  B  €  T  e  C’""'".  Then 


det(A  -  i{TB)^)  =  -i[det{A  -  i{TB)'^)][A  -  i{TB)^]-^B'^ 


where  Tij  ^  Tj^i- 
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Proof.  Element  (jl)  of  matrix  (TB)  is  ^  TjpBpi.  The  element  of 

p=i 

{TB)  is  in  the  element  of  Q  =  A  —  i{TB)^.  The  (/j)**  element  of  Q  is 
given  by 

Expand  down  column  j  to  obtain 


det(/l  -  i(TBY)  =  Y.iA,,  -  i  £  7’,,Bp,)|det(Q'i)l(-l)'+^ 

/=1  p=l 

The  {jky^  element  of  ^  det(i4  —  i{TB)^)  is 


d  ^ 


dT 


-  E  Mo  -  *  E  TjpBp,  [det  (-1)'+^-  =  i-i)  det(Q'0 


i*'  /=i 


p=i 


1=1 


where  is  the  minor  of  Q  obtained  by  deleting  row  /  and  column  j 
from  Q.  Then  ^  det(i4  —  i{TB)^) 

I 


1=1 


(-iy[detQ'i]Bn  (-iy[det  ••• 

(-l)2[detg'2]5„  (-l)2[detg'2]B2,  ...  i-mdet  Q'^]Bmi 

(-ir[detQ'”]Bii  (-ir[detg'"]B2,  •••  (-l)”[detQ'”]B„, 


/  \ 

(-l)‘detQ'i 

(-l)2detg'2 


-D-D' 

1=1 


\ 


(-1)”  detg 


7n 


(Bll,  B2I,  •  •  •  ,  flm/) 
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Let 


and  let 


f  Bu  ] 


B,= 


\  ) 


Q‘  =  [(-1)'+'  det  (-!)'*+'  det  Q‘\  •  •  • ,  (-1)"+'  det  g'”] 


Then 


~  det(A  -  i(TBf)  =  -i  '£(Q‘fB?'  =  -m'f.  (Q^f,  ■  ■  ■ ,  (<?")’■) 

/sl 


^  BT  ^ 


BI 


\  / 


(-1)1+1  det  gii 

(-1)1+2  det  g2i  ... 

(-1)1+"  detg"i 

(-1)2+1  det  gi2 

(-1)2+2  det  g22  ... 

(-1)2+"  det  g"2 

(-l)"+idetgi" 

(-l)”+2detg2"  ••• 

(-1)"+"  det  g"" 

=  -i[detg]g-'5^ 

B^ 


by  Cramer’s  Rule,  which  implies 

^  det[yl  -  i{TBf]  =  -i[det(A  -  i{TBf)][A  -  i(TBf]-^B'^ 

Theorem  15  Let  A  6  B  €  C”’'”*,  T  G  C’"’'”,  c  e  C.  Then 

4;  detiA  -  cBT)  =  -c[det(>l  -  cBT)][{A  -  cBT)-^Bf 
oT 
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where  Tij  ^ 

Proof.  Element  {ik)  of  matrix  Z  ~  A  —  cBT  is 


The  operator  ^  is  a  matrix  of  operators  Note  that  every  element  of 
column  k  of  Z  contains  T,*.  Expand  det(Z)  down  column  k.  Let  Z'’*'  be  the 
minor  of  Z  associated  with  the  element  Zp/t,  where  Z^*  is  obtained  from  Z  by 
removing  row  p  and  column  k  from  Z.  Then 

det  Z  =  det  Z'’* 

p=i 

Element  (jk)  of  ^  det  Z  is 

^detZ  =  f:(-ir‘|d6tZ'‘J(-cfl,i) 

p=i 

The  full  matrix  is  ^  det  (A  —  cBT)  — 

^  E(-ir‘[detZ»’‘](-cBp,)  •••  i:(-ir"[detZP"](-c/?p,)  ' 

P=1  P=1 

t{-iy+^[detZ^^]{-cB,2)  •••  i:(-ir"[detZP"](-cfip2) 

P=1  P=1 

EC-l^Mdet^^'K-cfipn*)  •••  i:(-ir"[<ietZP"](-cBp^) 

\  p=i  p=i  / 

Let  row  p  of  matrix  B  he  Bp  =  {Bp\,  Bp2,  •  •  • ,  Bpm)-  Then  ^  det(i4  —  cBT)  = 
-c'^Bj  [(-!)'’+’  det(ZP‘),  (-l)P+2det(Z'’*),  ■  •  • det(Z'’")] 

p=i 
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=  detZP 

p=i 


where 


detZ^  def  [(-l)'’+Met(ZP'),(-l)'’+Met(Z'’'*),---,(-ir+”det(Z'’”)] 
Therefore  ^  det(/4  —  cBT)  = 


-c[Bfdet(Z^)  +  Bidet(Z^)  +  ■  ■  •  +  B^  det(Z”)] 


-c{B'l,Bl,---,Bl) 


det  Z^ 


det  Z" 


detZ^ 


det  Z" 


(-1)1+1  det  Z“  (-1)1+2  det  Z12  ...  (_i)i+ndetZi” 

(-1)2+1  det  Z21  (-1)2+2  det  Z22  ...  (_i)2+ndetZ2" 

=  -cB'^  =  X 


(-l)'*+idetZ”i  (-l)”+2detZ"2  ...  (-l)"+"detZ" 
=  -c[det(yl  -  cBT)]B'^\{A  -  cBT)-^f 

=  -c[det(yi  -  cBT)][{A  -  cBT)-^Bf 


z-‘  = 


det  Z 


(-1)1+1  det  Z"  ...  (_i)i+ndetZi" 


(-!)"+!  det Z”i  •••  (-l)"+"detZ" 


adj  Z 
det  Z 


If  Z"i  does  not  exist,  X  =  -^  det{ A- cBT)  is  still  valid  and  will  exist  provided 
that  each  S-  exists. 
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Theorem  16  Let  A  €  C"^",  and  B,T  6  c  €  C.  Then 

^  det(/l  +  cBT^)  =  cldet(A  +  cBT^)](A  +  cBT^)-^B 
where  Tij  ^ 

Proof.  Let  Z  =  A  +  cBT'^ .  The  element  of  Z  is  given  by 


^P3  — 


^pj  + 

/=i 


pj 


Every  element  of  column  j  of  Z  contains  Tj*.  Expand  det  Z  down  column  j. 
We  see  that 


detZ  = 

p=i 


^pi  ^  BpiTji 

1=1 


det  Zw 


where  Z^'^  is  the  minor  of  Z  associated  with  Zpj.  Then 

d 


dTi 


ik 


det  Z  =  J^(-l  V+^icBpO  det  ZP^ 


p=i 


dT 


det(Z)  = 


/  n 


E  (-l)i-^'’(c5pi) det  ZP^  E  (-1  )‘+'’(c5p2)  det  Z^’i 
p=i  p=i 

E  (-1)2+P(c5pi)  det  ZP2  E  {-l)2+p(cBp2)  det  Z^^ 

p=i  p=i 


E(-ir+'’(cBpi)detZ'’"  E(-l)”‘''’(cBp2)detZP" 

\  p=i  p=i 


E(-ir'’(c5p^)detZ'’i 

p=i 

E(-l)"+'’(cBp^)detZP2 

p=i 


^(_l)n+P(cBp^)detZP" 

p=i 


\ 
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(_l)i+PdetZP> 


=  -E 

p=i 


(  Bp\ ,  Bp2 ,  •  •  •  ,  Bpm  ) 


y  (_l)"+PdetZP" 
Let  Bp  —  (J5pi,  5p2i  ■  ■  ■  5  Bppi)  and 


det  ZP  = 


(-l)‘+PdetZPi 


(-l)n+p  det  ZP" 


Then 


;  det(Z)  =  c  ^^[det  Z^]Bp  =  c[det  Z\  det  Z^,  •  ■  • ,  det  Z”] 

p=i 


(-ly+MetZ"  •••  (-l)i+"detZ"' 

(_l)n+idetZ*"  •••  (-l)’*+"detZ"" 
=  c[adj  Z]B  =  c[det  Z]Z~^5 


B  =  cRB 


when  Z  ^  exists.  Thus 


^  det(>l  +  cBT'^)  =  c[det{A  +  cBT^)]{A  +  cBT^)-^B 
when  (A  +  cBT^)~^  exists.  Even  when  (A  +  cBT^)~^  does  not  exist, 


^det(A  +  c5r^)  =  c/?B 


is  valid  when  each  exists. 
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B.3  Diflferential  Operator  D{Z)  = 

Work  in  this  section  was  done  by  me.  I  have  not  searched  the  literature  for 
similar  results.  They  are  simple  enough  to  have  been  done  by  any  senior  in 
engineering  after  exposure  to  the  simple  results  of  differentiation  of  a  complex 
variable. 

Definition  4  Define  the  differential  operator  D{z)  which  operates  on  complex- 
valued  functions  by 

=  (s  +  %) 

where  x  =  Re(2)  and  y  =  Im(2).  This  form  becomes  useful  when  using  char¬ 
acteristic  functions  to  evaluate  expected  moments  of  a  distribution.  We  want 
to  learn  some  basic  properties  of  D{z). 

First,  look  at  the  relationship  between  D{z)f  and  for  complex- valued 

function  /.  Suppose  the  derivative  ^  exists.  Then  by  the  Cauchy- Riemann 
equations, 


This  says  that  when  ^/(z)  exists  then  D{z)f{z)  =  0. 
Let  us  consider  a  few  simple  cases. 


(B.20) 
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=  (S  +  %)  ("-'!')  =  1  +  ‘  =  2  (B-21) 

D{z)Re(z)  =  Diz)^iZ  +  Z-)  =  1  (B.22) 

D{z) Im(z)  =  D(z]j^(z  _  Z-)  =  ^  A  +  i(2ij,)  =  i  (B.23) 

D(z-)z=(J^-i~yx  +  iy)=l  +  l=2  (B.24) 

D{z)  1^1  =  D(z){x^  +  =  i(rr"  +  y'‘)-'I^D(z)(x'^  +  y^)  (B.25) 


Thus  D{z)  \z\  produces  a  unit  length  vector  pointing  in  the  direction  of  2. 

D(.z-)  Izl  =  j|l(2x  -  i2y)  =  p  (B.26) 

D(2*)l2|"  =  22*  (B.28) 


B.3.1  Vector  Case 

Let  2  now  be  a  complex  vector  in  C".  Then  define 


For  the  following  discussion,  let  a  6  C"  and  A  G  C"^". 


(B.29) 
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Let  f{z)  =  a^z.  Then 


D{z)a^z  = 


/  N 

/  \ 

Dizx) 

0 

: 

n 

X)  = 

; 

t=l 

^  D{Zn)  y 

the  zero  vector. 


=  D{z) 


D{z)a^z  =  0 
D{z)z'^Az  =  D{z) 

.=1  i=i 

/  \ 

AuZi  +  ^12^1-22  +  •  •  •  +  AjnZjZn 

-hA2lZ2Zi  +  /422~2  +  *  ■  ■  +  A2n^2^n 

+  ^31  •2^3^!  +  •^32-2^322  +  '  ’  *  +  ^3n2^32n 

+  ••• 

"}" AnxZjiZx  “f"  Aji2^n^2  "f“  '  '  '  '4"  ■^nn‘2’^ 

/ 


=  0 


D{z)z^Az  =  D{z) 


/ 


■^11  |•2^l|^  +  •^122^12^2  +  •  •  •  +  Axti^x^n 
■h'^21-2^2'^1  "h  ■^22  |■22|^  +  •  •  •  +  •i42n'2^2'^n 
+  *  '  * 

+  A„xZ*Zx  +  An2a^Z2  +  ■  •  •  +  Ann  \^n\^  j 

\ 


2j4ij2i  +  2/4i22^2  +  •  •  •  "t"  2i4ln'2n 
2i42l2l  +  2/4222^2  +  ■  •  •  +  2/42n2n 

^  2AnxZi  +  2i4n2^2  +  ’  •  ’  +  ‘ZAnn^^n 


(B.30) 


(B.31) 


(B.32) 
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=  2 


/  N 

(  \ 

■^11  Ai2  •  •  •  Ain 

Z\ 

A21  A22  •  •  •  A2n 

Z2 

^  -^nl  An2  ■  ■  ■  Ann  ^ 

\  J 

=  2Az 


Note  that  this  result  does  not  depend  on  the  symmetry  of  A. 
of  A  does  not  influence  the  form  of  this  result. 


D{z)z^ Ay  =  D{z)  ^  ZiAiy  =  0 

«=i 


/ 


Note  that  this  exists  and  is  zero. 


1=1 


/  \ 

2Aiy 

D{z)z^Ay  =  D{z)  53  =  : 

^  2/l„y  j 

D{z)y^Az  =  D{z)J2y'^A'zi  =  0 

i=l 

for  A  =  (/!*,  A^,  •  •  • ,  /I").  This  exists  and  is  zero. 


=  2Ay 


Diz)y^Az*  =  D{z)'£y^A'z:  = 


\  2y^A"  y 


(B.33) 


The  structure 


(B.34) 


(B.35) 
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(A^fy 


=  2A^y 


(B.36) 


{A-fy 


B.3.2  Matrix  Case 


Let  z  now  be  an  unstructured  matrix  in  M„(C). 


Then 


D{Zij)dei  Z  =  D{Zij)  ^  (sgn  <r)Zi„(i)Z2a(2)  •  •  •  ■^n<T(n) 


H  (sgn  <7)  n  ^kc(k)  =  0 

<7es„  fc=i 


(B.37) 


Sn  is  the  permutation  set  on  n  letters,  and  cr(fc)  is  the  permutation  in  5„. 

(7  7  \ 

7,11  /j\i 

Suppose  Z  =  Z"  is  a  2  X  2  matrix  .  Then 

I  Z12  Z22  , 


Z?(Zi2)detZ  —  D{Z\2){Zi\Z22  —  |Z^i2|^)  —  ~2Zi2 


Now  consider  the  3x3  matrix  Z  =  Z^ .  Then  Z?(Zi2)det  Z 


=  L?(Zi2)[ZiiZ22Z33  +  Z12Z13Z23  +  ^12^13'^23 

—  jZial^  Z22  —  Zii  |Z23|^  —  IZ12I*  Z33] 

=  2Z13Z23  —  2Z12Z33  =  2(Zj3Z23  —  ZJ2Z33)  =  — 2det(Z'^) 


where  Z^^  is  the  minor  of  ZTn  =  Z21  •  Similarly, 


Z)(Zi3)det  Z  =  2Zi2^23  —  2Z13Z22  —  2det(Z*^) 
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Z)(^23)  ^  —  2^J'2^13  —  2^11  ^23  —  — 2  det  TP' 


Finally, 


det  Z  =  £>(Z..  )  det  Z  =  0 


Putting  this  all  together,  we  get 


0  -  det  Z12  det 


D(Z)detZ  =  2  _(letZ‘2  0  -detZ^^ 


det  Z13  _  det  Z23  0 


=  2  (-1)1+2  det  Z12 


(-l)»+2detZ^2  (-1)1+3  det  Z^3 

0  (-1)2+3  det  Z23 


(-1)1+3  det  Z13  (- 1)2+3  det  Z23 


(-1)1+2 det  ZJ2  (-1)1+3 det  Z‘ 


=  2  (-1)1+2  det  Z12 


(-1)2+3  det  ^23 


(-1)1+3  det  Z13  ( -1)2+3  det  Z23 


Recall  that 


z-i  = 


(-1)1+1  det  Z"  (-1)1+2  det  Z12  (-1)1+3  det  Z13 
(-1)2+1  det  Z21  (-1)2+2  det  Z22  (-1)2+3  det  Z23 
(-1)3+1  det  Z31  (-1)3+2  det  Z32  (-l)3+3det  Z33 
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D{Z)detZ  = 

( 

(_l)i+idetZ“  0  0 

2Z~^[det2]-2  0  (-1)2+2  detZ22  0 

^  0  0  (-1)3+3  detZ33 

Thus 

=  2  [Z-^[det  Z\  -  diag  (det  Z*^  det  det  Z^)\ 

In  general,  for  Z  =  Z^  6  M„(C),  we  have 

D{Z)  det(Z)  =  2[Z-^  det(Z)  -  diag(det  Z”,  •  •  • ,  det  Z”")]  (B.38) 

B.3.3  Differential  Functions  of  Determinants 

These  results  were  supplied  by  me.  I  have  not  diligently  searched  the  literature 
for  them.  They  are  simple  enough  to  have  been  done  by  any  senior  in  engi¬ 
neering  after  brief  exposure  to  the  principles  of  differentiation  of  a  complex 
variable. 

Proposition  22  Let  A  €  B  €  T  e  C'"’'".  Then 

D{T)dei{A-iBT)  =  t) 

m 

Proof.  The  {iky^  element  of  the  matrix  {A  —  iBT)  is  (.4,^  —  i  Z)  BiiTik)ik- 
Every  element  of  column  k  oi  {A  —  iBT)  contains  Tjk-  Expand  det(i4  —  iBT) 
down  column  k.  Let  Z  =  A  —  iBT  and  let  Z^^  be  the  minor  of  element  Zpk 


obtained  by  removing  row  p  and  column  k  from  Z.  Then 

n  m 

det(z)  =  E 

p=i  1=1 

The  {jky^  element  of  D{T)  det(v4  —  iBT)  is  D{Tjk)  det  Z  =  0.  Thus 

D{T)  det(v4  -  iBT)  =  0 

Theorem  17  Let  A,B,T^  =  T  6  C"^”.  Then 

D{T)  det(y4  -  iBT)  =  i[A{Z-'^B)  -  2Z-^B]  det(Z) 

where  Z  =  A  —  iBT  and  A(A)  is  a  diagonal  matrix  of  the  diagonal  elements 
of  A.  Further, 

D{T)  det(/  -  iBT)  =  i[A{B)  -  2B] 

r  =  0 

Proof.  We  know  from  previous  examples  that 

n  m 

det(Z)  =  det(A  -  iBT)  =  '£{-l)^^'‘{Ar,k  -  i'^  B^,Tik)  det{Z^'^) 

p=i  1=1 

where  is  the  minor  of  Z  =  A  —iBT  formed  by  deleting  row  p  and  column  k. 
We  now  have  the  additional  relationship  that  Tik  =  We  know  D{Tjk)Tjk  = 
0  and  D{Tjk)Tjk  =  2  for  j  ^  k.  Also, 

DiT,,)T,,  =  D{T,j)Re{T,,)  =  l 
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since  Tjj  €  R.  To  locate  nonzero  entries  of  D{T)  det(Z)  we  rewrite  ciet(Z)  as 

n  m 

det(Z)  = 

p=l  /=! 

Then 

C(Tjj)det(Z)  =  £(-l)'’+‘(-.2B„)det(Z"') 

p=l 

for  k  ^  y,  and 

I)(T«)det(Z)  = 

p=l 


Then  alle  zusammen,  D(T)  det(Z)  = 

~i  t  (-l)"+'5pi  det(ZP‘)  •  •  •  -it  (-l)P+'2Rpn  det(ZPi) 

p=i  p=i 

-i  t  (-iy+^2Bpi  det(ZP2)  ...  -i  f  (-l)P+22^pn  det(ZP2) 

p=i  p=i 

-i  t  (-l)'’+”2Rpi  det(ZP")  •  •  •  -it  (-l)'’+"Rpn  det(ZP") 

p=i  p=i 

Let  Bp  =  {Bpi,  Bp2,  ,  Bpm)  be  row  p  of  matrix  B.  Note  that 


n 

f=i: 


p=i 


{-ly+^Bp,  det(ZPi) 
i-iy+^Bp,  det(ZP2) 

(-l)p+^Bpi  det(Z'”*) 


(-l)P+iBp„det(ZPi) 

{-l)P+^BpndetiZr>^) 

(-iy+^Bp„  det(ZP") 


(-l)P+i  det(ZPM 

n 

=  E 

p=i 

(_l)P+n  det(ZP”) 


5p 


(_l)i+idet(Z“)  •••  (-!)"+» det(Z"i)  Bi 

=  :  :  :  =  [det(Z)]Z-^B 

(-l)i+”det(Zi”)  •••  (-l)"+"det(Z"”) 

Substituting  into  our  problem,  we  obtain 

D{T)[det{Z)]  =  -i2Z-^Bdet(Z)  +  iA[Z-^Bdet(Z)] 

where  A(A)  is  the  diagonal  matrix  whose  elements  are  on  the  main  diagonal 
of  matrix  A.  Scalars  commute,  so  we  have 

Z)(T)[det(Z)]  =  i[A{Z-'^B)  -  2Z-^B]det(Z) 

Expanded,  we  obtain 

D{T)  det{A  -  iBT)  =  i[A{(A  -  iBTy^B]  -  2{A  - 
When  this  is  evaluated  at  T  =  0,  we  obtain 

I>(r)[det(Z)]ly^o  =  i\A{A-'^B)  -  2A-^B] 

When  we  further  simplify  to  /I  =  /,  then 

D{T)det{I  -iBT)  =i[A{B)-2B] 

T  =  0 

B.4  Complex  Characteristic  Functions 


The  discussion  of  characteristic  functions  of  complex  variables  is  almost  non¬ 
existent  in  the  literature.  C.  R.  Rao  [218]  provided  a  definition  which  is  the 
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starting  basis  for  the  following  study.  The  remainder  of  the  results  in  this  sec¬ 
tion  were  developed  with  Ferlez  [82].  Acknowledgment  of  his  contribution  does 
not  constitute  his  certification  that  he  has  reviewed  and  approved  the  enclosed 
material.  Although  we  developed  different  results,  the  work  contained  here  is 
greatly  extended  beyond  its  original  bounds  and  more  thoroughly  thought 
out  because  of  the  wonderful  semester  of  discussions  with  Ferlez.  Thus,  even 
though  I  am  responsible  for  these  results,  they  would  not  have  been  developed 
without  his  active  insights.  If  these  results  stand  the  test  of  close  examination, 
then  he  should  receive  much  of  the  positive  credit. 

In  the  case  of  real  variables,  the  theory  of  characteristic  functions  has  been 
cast  as  an  application  of  the  Fourier  transform.  Properties  of  the  Fourier 
transform  have  been  extensively  developed  and  widely  used,  particularly  by 
scientists  and  engineers  working  with  time  series  data.  A  major  attraction  of 
characteristic  functions  to  statistics  is  that  they  provide  a  conceptually  simple 
way  to  evaluate  the  expected  value  of  some  linear  combinations  of  random 
variables.  In  distribution  theory,  the  Fourier  transform  provides  a  useful  tool 
for  obtaining  nice  results  via  its  properties.  A  first  exposure  to  this  application 
in  statistics  often  is  in  the  proof  of  the  Central  Limit  Theorem,  which  yields 
results  more  general  than  obtainable  when  restricting  attention  to  moment 
generating  functions. 

Characteristic  functions  are  important  to  the  development  of  the  proper- 
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ties  of  the  distributions  needed  as  background  for  the  eventual  development 
of  the  joint  density  of  the  sample  eigenvalues  of  the  complex  Wishart  matrix. 
The  motivation  for  researching  various  forms  of  a  characteristic  function  had 
its  genesis  in  examining  the  existence  of  the  derivative  when  applied  to  the 
characteristic  function  for  the  complex  Wishart  distribution.  In  particular, 
this  examination  included  applying  principles  used  in  deriving  the  Cauchy- 
Riemann  equations.  The  case  will  arise  when  examining  moments  of  the  com¬ 
plex  Wishart  distribution  that  blind  application  of  the  formula  referred  to  cis 
the  “characteristic  function”  does  not  yield  (at  first  blush)  what  is  desired. 
There  are  two  reasons  for  this.  One  reason  is  that  the  usually  cited  formula  is 
not  really  the  characteristic  function  of  the  complex  Wishart  distribution,  but 
rather  the  formula  is  the  characteristic  function  of  another  complex  matrix 
variable  that  is  algebraically  related  to  the  complex  Wishart  random  variable. 
The  second  reason  is  that  the  derivative  with  respect  to  the  transform  variable 
matrix  of  the  function  does  not  exist.  It  was  this  discovery  that  lead  to  un¬ 
derstanding  the  need  for  developing  the  theory  of  the  characteristic  function 
of  complex  matrix  variables. 

In  the  case  of  complex  variables,  the  choices  of  functions  to  call  a  “char¬ 
acteristic  function”  widens.  The  unwary  engineer  (or  theoretician)  may  apply 
one  concept  of  a  characteristic  function  to  a  result  derived  by  another  worker 
using  a  different  concept.  These  different  concepts  appear  similar,  but  they 
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have  different  properties.  After  worrying  for  a  while  over  which  version  was 
the  “correct”  version,  it  became  apparent  that  merely  searching  for  a  definition 
from  an  authority  was  not  as  instructive  as  attempting  to  construct  families 
of  functions  and  observing  the  resulting  properties.  After  all,  it  is  the  set  of 
properties  of  the  characteristic  function  that  make  characteristic  functions  im¬ 
portant,  not  the  fact  that  someone  dreamed  up  a  formula  with  a  fancy  name 
that  also  has  applications  in  other  disciplines.  The  material  that  follows  is  the 
record  of  investigations.  By  reading  it,  you  should  be  able  to  develop  an  idea 
of  the  different  kinds  of  issues  that  need  to  be  considered. 

Some  properties  I  prefer  to  have  in  a  characteristic  function  for  complex 
variables  follow: 

1.  When  reduced  to  the  real  variables  case,  the  complex  variable  character¬ 
istic  function  should  behave  as  the  real  variable  characteristic  function. 

2.  The  complex  characteristic  function  should  be  useful  for  computing  ex¬ 
pected  values  of  moments. 

3.  The  complex  characteristic  function  should  be  useful  in  deriving  charac¬ 
teristic  functions  of  linear  and  affine  functions  of  complex  random  vari¬ 
ables. 

4.  It  should  be  fairly  easy  to  determine  the  existence  of  the  desired  opera¬ 
tions,  such  as  a  derivative  (if  used)  applied  to  characteristic  functions  to 
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compute  moments 

The  following  forms  are  the  starting  points  for  the  discussion  to  follow. 

$,(r)  =  f{exp(iRe[tr(T"Z)])} 

^,(r)  =  f{exp(itr(r^Z))} 

n.(r)  =  £:{exp(ar(r"z))} 

y^{T)  =  £{exp(i>(<  r,Z  >))} 

B.4.1  Definition  of  Characteristic  Function  of  a  Com¬ 
plex  Random  Matrix 

This  study  begins  with  a  derivation  of  the  characteristic  function  of  a  complex 
random  variable,  as  presented  by  C.  R.  Rao. 

Definition  5  Let  z  be  a  complex  vector  random  variable,  and  let  t  be  any 
complex  vector  of  the  same  dimension  as  z.  Then  the  characteristic  function 
is  given  by 

=  ^{exp(i  Re[tr{f"z)])} 

To  see  this,  let  t  =  ti  +  it2  and  z  =  x  -{■  iy.  Then 

t^z  =  -  itl){x  +  iy)  =  tfx  +  tjy  +  -  t^x) 

Thus  Re[<^2]  =  tfx  +  t^y.  Consider  s^  =  {tjit^),  and  where 

3,W  G  R\  Then  s^w  =  tjx  +  tjy  and 

=  £{exp[z(s^u;)]}  =  £{exp[i{tjx  +  t^y)]} 


=  £{exp(z  Re[tr(<^2)])}  = 
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Note  that  Re(i^2)  satisfies  the  properties  of  an  inner  product. 


B.4.2  Basic  Properties  of  Characteristic  Functions 

Some  of  the  work  in  this  section  wa.s  motivated  by  Arnold’s  discussion  [31] 
of  moment  generating  functions  for  the  case  of  multivariate  distributions  of 
real-valued  matrices.  His  presentation  is  in  his  equations  17.4  through  17.9. 
Except  as  otherwise  noted,  I  have  supplied  all  the  work  in  this  section. 

Let  the  characteristic  function  of  a  matrix  complex  random  variable  be 


^^(^)  =  £:{exp(^Re[tr{T«Z)])} 


(B.39) 


where  T  =  {tij)  is  a  matrix  that  has  the  same  dimensions  as  Z^xn-  Then 


^z{T)  =  £{exp(i  RefX)  X) 

k=lj=l 


(B.40) 


where 


/* 

tjl  (.21 


Zu  Zi2  •  •  •  2ln 


/*  /* 
Ijj  ^22 


221  ~22  •  •  •  ^2n 


i*  f* 
‘^In  ‘2n 


^ml  ^m2  '  ■  ' 


-  ^  H  + - 1-  X  -  X  X]  (B-41) 

j=.\  j=l  j=l  fc=l  >=1 

Proposition  23  Let  a  6  C,  and  let  W  =  aZ.  Then 


<i>az{T)  =  f{exp(7:Re[tr(r"aZ)])}  =  ^z{a'T) 


(B.42) 


Theorem  18  Let  A  e  B  e  C  e  V^xr  =  AZB  +  C,  and 

Z  e  C^^P.  Then 

^y{T)  =  ^AZB^c{T)=eM^M^AT^C)\)^z{A^TB») 

Proof.  The  proof  consists  merely  of  following  the  definition  and  applying 
the  algebra. 

^y{T)  =  ^azb^c{T)  =  ^{exp(i  Re[tr(r"{^Z5  +  C})])} 

=  f  {exp(i  Be[iv{T^AZB)  +  tr(r^C)])} 

=  exp(t  Re[tr(r"C)])£{exp(i  Re[tr(r"AZB)])} 

=  exp(«  Re[tr(7’^C)])5{exp(t  Re[tr(BT’^i4Z)])} 

=  exp(^  Re[tr(r^C')])^{exp(t  Re[tr({/1^T5")"Z)])} 
exp(^Re[tr(^"C’)])$z(/l"^B")  =  ^azb^c{T) 

□ 

A  useful  special  application  of  this  property  is  important  enough  to  sepa¬ 
rately  identify. 

Corollary  5  Let  Z  €  C”’'”’  haiw  characteristic  function  ^ziT).  Suppose  you 
want  the  characteristic  function  of  a  xrcighted  sum  y  of  elements  of  Z.  Let 
A'  €  C"’""  specify  the  desired  weights.  Then  ^y(t)  =  ^z(At)  where  t  is  a 


scalar. 
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Proof.  An  arbitrary  sum  of  weighted  elements  of  Z  is  given  by  y  = 
H  Z)  By  lemma  29,  y  =  ix{A" Z).  Since  y  is  a  scalar,  its  charac- 

j=i  k=i 

teristic  function  is  given  by 

^y(t)  =  S{exp(iRe[t’y])}  =  S{exp(i  Re[t*  tr(A^Z)])} 

=  S{exp(i  Re[tr(t*A^Z)])} 

=  S{exp(t  Re[tr(At)^Z])}  =  ^z(Ai) 

Note  that  matrix  A  is  the  complex  conjugate  of  the  desired  set  of  weights. 

This  was  motivated  by  Goodman’s  theorem  (p.l69)  [92]  for  $trz(0-  To 
get  the  characteristic  function  of  Zij,  just  set  element  =  1  and  all  other 
elements  of  A  to  zero.  To  get  the  characteristic  function  of  the  sum  of  the 
,ow  of  Z,  set  the  row  of  A  to  (1, 1,  •  •  • ,  1)  and  all  other  elements  of  A  to 
zero. 


Proposition  24  Let  Z  =  {Z\^Z-2)  where  Zi  is  n  x  pi.  Similarly,  let  T  = 
(Ti,T2)  where  Ti  is  n  x  pi.  Then 

$z(Ti,0)  =  $z,(Ti) 


Proof. 


«hz(r,,0)  =  £{exp(i  Re[tr(r,,0)"(Z„Z2)])} 


=  £{exp(i  Re[tr 


'  T,"  '' 


\  »  / 


(Zi,Z2)])}  =  5{exp(?:Re[tr 


/ 


T^Zi  r«Z2 
0  0 


(B.43) 

\ 

])} 

/ 


=  5{exp(t  Re[tr  (T»Z^)])}  =  ^zATi) 


349 


(B.44) 


Proposition  25  Similarly,  let  Y  =  and  S  =  where  Yi  and 


Si  are  ni  x  p.  Then 


=  ^n(5i) 


Proof. 


Yi 

=  £{exp(iRe[tr  ])} 

,  0  i  ,  0  i  ,  ^2  > 


=  5{exp(iRe[tr  f  5//  q)  ])} 

^  ^  / 

=  £{exp(zRe[tr(5"Ki)])}  = 


(B.45) 


(B.46) 


Proposition  26  Let  Z  =  (Zi,Z2),  Zi  and  Z2  be  independent,  and  T  = 


{Ti,T2).  Then 


^z{T)  =  ^z{Ti,Q)^z%T2) 


Proof. 


(i>z(T)  =  £■{exp(^Re[t^(^„^2)"(Z,,Z2)])}  (B.47) 


=  f{exp(iRe[tr 


(ZuZ2m 
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^  T"Zi  ^ 

=  £{exp(i  Re[tr  ])} 

TH  7  tH 7 
±2  I2  ^2  J 

=  S{exp{i  Re[tr  (t"Z,)  +  tr  [T» Z2)])) 

=  £‘{exp(i  Re[tr  (t(^ Zi'j]) exp{i  Re[tr  (T^ Z2^])} 
If  Zi  and  Z2  are  independent,  then  this  equals 


£{exp{i  Re[tr  (7\^Zi)])}£{exp(i  Re[tr  (t^ Z2^])} 


=  ^zATi)^Z2iT2)  =  ^ziTi,0)^ziO,T2)  (B.48) 


□ 


=  £:{exp(t  Re[tr  (Sf  Yi  +  Sf  y2)])} 

=  £{exp(i  Re[tr  (5^yi)])exp(z  Re[tr  (*S'^Y2)])} 
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If  Y\  and  Va  are  independent,  then 

=  £{exp(2  Re[tr  (5/^Fi)])}£{exp(2  Re[tr  (‘S'^Y2)])} 

=  $y,(5i)$K.(52)  =  $V, 


(  r,  \ 

(  \ 

Si 

0 

^Y2 

.  0  y 

(B.50) 


Proposition  28  The  characteristic  function  of  the  transpose  of  the  random 
variable  is 

(!fzT{T)  =  <^z{T^) 


Proof. 

^zr{T)  =  £{exp(^•  Re[tr{r"Z^)])} 

=  f{exp(iRe[tr(Zr*f])}  =  5 {exp(i  Re[tr(Zr*)])} 
since  tr  =  tr  A,  and  this  equals 

<5zr(r)  =  £:{exp(iRe[tr(r*Z)])}  =  <l>z{T^) 


(B.51) 


□ 


Proposition  29  Similarly,  the  characteristic  function  of  the  Hermitian  trans¬ 
pose  of  the  random  variable  is 


^z»{T)  =  ^z*iT^) 
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Proof. 

<i>ZH{T)  =  5{exp(iRe[tr(T^Z")])}  =  e{expiiRe[tT{Z^T*f])] 

=  5{exp(iRe[tr(2:*r*)])}  =  f  {exp(i  Re[tr(r*Z*)])}  =  $z.(r’’)  (B.52) 

□ 

Discussion.  Note  that  in  general  there  is  no  simple  connection  between 
$^(7’)  and  ^z*{T).  Let  Z  =  A  -h  iB  in  the  following  discussion. 

^z(T)  =  £^{exp(i  Re[tr(T^Z)])}  =  £{exp(2  Re[tr([^  +  iB\^[C  +  iD])])} 

=  £{exp(i  Re[tr([i4  —  iB]^[C  +  *D])])} 

=  £{exp(i  Re[tr(/l^C  +  B'^D  -  iB'^C  +  iA^D)])] 

=  S{exp{i  ReM^^C  +  B^D)])} 

and 

<i>z,{T)  =  5{exp(i  Re[tr(r"Z*)])}  =  5{exp(i  Re[tr([A  +  iBf[C  +  iD]*)])} 

=  5{exp(i  Re[tr([.4  —  iB]^[C  —  *D])])} 

=  £{exp(i  ReMA^C  -  B'^D  -  iB'^C  -  M^D)])} 

=  £:{exp(i  Re[tr(/l^C  -  B'^D)])} 

As  an  additional  curiosity, 

^z{T)  =  [5{exp(i  Re[tr(A’’C  +  R^D)])}]*  =  £{exp(-i  Re[tr{A^C  +  R^D)])} 
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If  Re(Z)  and  Im(Z)  are  independent,  then 

^z{T)  =  $Rez(r)$i„,z(n  (B.53) 

=  S{exp{i  Re[tr(^^C')])}£’{exp(i  Re[tr(B^D)])} 

^Z*(T)  =  ^Rez{T)^-hnZ{T)  =  ^Re  ziT)^}m(-Z){T) 

=  s{exp  (iRe[tr(A^C)])}f  {exp(iRe[tr(-B^/))])}  =  ^R^z{T)^l^ziT) 

The  characteristic  function  of  the  trace  of  a  random  square  matrix  can 
be  obtained  from  the  characteristic  function  of  the  random  square  matrix  by 
judicious  choice  of  the  transform  variable. 

Theorem  19  Let  t  €  C  and  Z  €  C”^”.  Then 

^tTz{t)  =  ^z{tln) 

This  is  stated  in  Goodman  (p.  169)  [92]  without  proof. 

Proof. 

^tr z{t)  =  £’{exp(i  Re[tr(r(tr  Z))])}  =  £{exp(z  Re[tr(rZ)])} 

=  5{exp(z  Re[tr(r/Z)])}  =  £^{exp(i  Re[tr((t/)^Z)])}  =  ^z{tln) 

□ 

Theorem  20  Likewise,  /et  a  6  C”,  <  6  C,  and  Z  G  Then 


^aHZa(0  =  ^z(<aa") 
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Proof. 

^aH/a(0  =  £^{exp(i  Re[tr(<*a^Za)])}  =  £{exp(i  Re[tr(af*a^Z)])} 

=  £{exp(i  Re[tr((afa^)^Z)])}  =  =  ^z{taa^) 

since  f  is  a  scalar. 

B.4.3  Just  a  Moment 

Except  for  the  statement  of  the  distribution  and  characteristic  function  of  the 
Cauchy  distribution,  this  section  was  supplied  by  me.  I  have  not  searched  the 
literature  diligently  for  these  results. 

Here  we  examine  the  properties  of  the  characteristic  function  as  a  moment 
generating  function.  This  study  is  the  beginning  of  a  discovery  of  a  few  inter¬ 
esting  surprises.  We  begin  with  a  result  that  is  standard  when  dealing  with 
real  variables  to  see  where  it  will  lead  us. 

Let  Zmxn  be  a  complex  matrix  random  variable  with  characteristic  function 
^z{T)  —  5{exp(i  Re[tr(T^Z)])}.  We  first  want  to  find  the  E{Z)  solution. 
Recall  that 

m  n  m  n 

«z(T)  =  f{exp(i  ReE  Y.  T;A])]  =  £{exp(iE  YiTR,i.ZR,k+T,i,Z,il.)m 

j=lk=l  j=lk=\ 
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where  T  =  Tn  +  iTj  and  Z  =  Zfi  +  iZi.  Then 


■^z{T) 


T  =  0 


=  S{iZRpq  exp(2[^  '^{TRjkZRjk  +  T/jkZijk)])} 
j=lk=l 


=  iS{ZRpq}  (B.54) 


T  =  0 


since  expectation  is  a  linear  operator.  We  similarly  take  the  partial  derivative 


with  respect  to  T/p,.  Then 


^{^pg}  =  ^{ZRpq  +  iZjpg} 


1  d 

i  9TRpq 


^z{T) 


+  ^^z{T) 

Oi  Ipq 


T  =  0 


r  =  o 


Olflpq  Oljpq 


T  =  0 


Note  that 


) /(r  ) 

\aT„„ 


by  the  conditions  leading  to  the  Cauchy- Riemann  equations.  Thus  the  opera¬ 


tor  we  want  to  use  to  find  €{Zpq}  is  not  the  complex  derivative  ^^^z{T). 
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Let  us  use  the  operator  we  established  by  definition  4,  given  here  by 


Then 


£{Zjk}  =  i-i)D{T,k)^ziT) 


£{Zjk}  exists  when  D{Tjk)^z{T)  exists.  Define 

/ 


r  =  o 


D{T) 


D{Tn) 

D{Tn)  ■ 

D{T2x) 

D{T22)  ■ 

••  T>(T2„) 

DiTmi) 

D{Tm2)  • 

••  D{Tmn) 

Then 


e{Z)  =  {-i)D{T)^z{T) 


T  =  0 


This  is  valid  for  arbitrary  m  and  n.  Let  D^{T)  be  the  transpose  of  the  operator 
matrix  D{T),  and  let  D^(T)  be  the  Hermitian  transpose. 

What  about  higher  moments?  It  is  instructive  to  examine  this.  My  intu¬ 
ition  failed  me.  The  wrong  solution  is  to  work  with  forms  like 


D^{T)D{T)^z{T) 


r  =  0 
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D\T)^z{T) 

T  =  0 

D”{T)D{T)^ziT) 

T  =  0 

etc.  The  right  solution  it  to  work  with  forms  like 

D{T'^T)^z{T) 

T  =  0 

D{T^T)^z{T) 

T  =  0 

etc. 

Although  D{T)D{T)^z{T)  is  not  what  we  want  for  computing  expected 
values  of  moments  like  5{Z^},  it  might  be  something  to  use  in  another  context. 
Do  not  throw  it  away;  just  put  it  into  your  tool  box.  Recall  that  $z(T’) 
is  a  scalar  valued  function.  D{T)^z{T)  is  a  matrix  with  the  dimensions  of 
T  €  If  we  now  look  at  D{Tjk)  of  that  matrix,  we  get  a  matrix  of  the 

same  size.  When  we  look  at  D{T)D{T)^z{T)  we  get  a  matrix  that  lives  in 
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0m* xn*  Likewise,  D^{T)D{T)^z{T)  lives  in  Elements  of 


D^{T)^z{T)  =  DiT)D{T)^z{T) 


look  like  the  following. 


D{Ti^)DiT)  D{Tn)D{T) 
D{T2,)D(T)  D(T22)D{T) 


D{T,r.)D{T) 

DiT2n)DiT) 


\ 

^ziT) 


\D{Tml)D{T)  D{Tm2)D{T)  D{Tmn)D{T)  j 


D{Tu)D{Tn) 

DiTn)D{T2i) 

D{T^^)D{Tmi) 

D{T2t)D{Tu) 

D{T2x)D{Trr.^) 
^  D(T,nl)Z)(T„.,) 


£>(r„)Z)(rin)  ••• 

DiTn)D{T2n)  ••• 

D{nr)D{Tmn)  ■■■ 
D{T2,)D{T,r.)  ■■■ 

D{T2l)D{Tmn)  ■■■ 

D{Tm^)D{T,n.) 


D{T^„)D(T2n) 

D{Tr„)D{Tmn) 

D{T2n)D{Trn) 

D{T2n)D{Tmn) 

DiTmn)D{Tmn)  ^ 


^z{T) 
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When  this  is  evaluated  at  T  =  0,  we  get: 


£{^11^21}  •••  S{ZiiZ2n}  £{Z\nZ2n} 


S{ZiiZml}  •••  £{ZiiZmn}  •••  S{Zi„Zmn} 
£{Z2iZ„}  ...  £{Z2iZ,„}  •••  £{Z2„Z,„} 


S{ZIJ  •••  S{Z^^Z^„}  ...  £{Z2J 

This  might  be  more  useful  than  the  result  we  are  seeking. 


If  you  compute 


D(T)D(T)^z(T)  -  lD(T)^z(T)]  0  lD(T)^z(T)] 


you  obtain  something  that  looks  like  a  simple  generalization  of  a  covariance 
matrix.  It  is 


£(Zf,)-[£{Z„)P 


£{Zf.)-|£{Z,„)P 


S{ZiiZ2l}  —  S{Zii}S{Z2l}  •••  S{Z]„Z2n}  —  ^{Zi„}S{Z2n} 


S{ZiiZml}  —  S{Zii}S{Zml}  S{Zi„Zmn}  —  ^{Zi„}S{Zmn} 

S{Z2iZrr}-S{Z2i}S{Zti}  £{Z2„Zi„}  -  £{Z2„}£{Z,„} 


£{Z^,}-[£{Z„,}P 


S{ZlJ-\e{Z^r.]f 
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COv[Z\\^Z\\)  COv[Z\niZ\n) 

COv(Zii,Z2i)  COv(Zi„,Z2„) 

COv(^Zi\ ,  Zffil^  •••  COV  Z \ji  ^  Zmn) 
COv(Z2i,Ztl)  ■■■  COv(Z2„,Zin) 

^  cov^Zj^i  ^  ZjT2i}  ***  cov^Zfjifi^  ^mn )  ^ 

=  -  cov(Z,  Z)  =  -[S{Z  ®Z}-  £{Z}  0  £{Z}]. 


Perhaps  this  is  what  we  should  want,  even  viiough  it  is  not  the  standard  con¬ 
struct  we  usually  se,jk.  The  statistician  will  object  to  my  using  cov(Zii,Zii) 
rather  than  var(Zn),  but  I  wanted  to  make  the  point  of  the  pattern  that 
developed. 

What  I  set  out  to  find  was  the  expected  value  of  functions  like  Z^ Z  or 
Z^ Z.  Each  element  of  the  resulting  matrix  Y  =  Z^ Z  is  a  linear  combination 
of  products  of  elements  of  Z.  This  is  more  complicated  than  our  previous 
situation.  Note  that 


Element  (j,  k)  is  given  by  (  X!  ZijZik)  ■  From  the  work  on  the  first  moment 

\i=i  J  jk 
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of  Z  we  recall 

m  n 

D(T)<l>z(T)  =  if{Z6Xp(!E  Y.(TRikZR,i,  +  r,y,Z, ,*)))} 

j=l  fc=l 

Element  (p,  g)  is  given  by 

m  n 

D{Tpg)^z{T)  =  i€{Zpgexp{i[^^{TRjkZRjk  +  TijkZijk)])} 

j=i  fc=i 

If  we  now  apply  D{Trs)  to  this  result,  we  get 

m  n 

D{Trs)D{Tp,)^z{T)  =  i^£{ZrsZp,exp{i[^'£{TRjkZR,k  +  Ti.kZijk)])} 

j=i  it=i 

Evaluating  this  at  T  =  0  gives  us 


DiTr,)DiTp,)^z{T)  =  -e{ZrsZp,} 

T  =  0 

To  obtain  the  expected  value  of  element  {s,  q)  of  Y  =  Z^Z,  I  must  find 


mm  m 

1=1  «=1  »=1 

r  =  o 

To  make  the  next  step  easy,  define 

i>.(r.,rp,)def£>(T,,)T>(r„) 

Note  that  this  definition  does  not  naturally  follow  from  D{Trs),  but  it  will  turn 
out  to  be  a  useful  definition.  The  close  critical  reader  would  observe 


D{TM  =  D[Re(r„T„)  +  i  lm(T„Tp,)] 
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where 


Re(TrM  =  -  TfrsTf. 


P? 


and 


Im(rr,Tp,)  —  T/rjThp,  +  TftrsTi 


PI 


and  thus 


D{TrsT,,)  = 


d 


+  i- 


d 


[5Re(T,,rp,)  9Im(r„Tp,)J 

This  is  not  what  I  want.  Using  the  proposed  definition,  then  £{Z^ Z]  = 


/  m 

m 

m  ^ 

E  T>(r,i  )Z)(T,i) 

t=l 

T.  D{Ti,)D{Ti2)  ••• 

t=l 

E  D{Ti^)D{Tin) 

t=i 

m 

ED(Ti2}D(Tn) 

m 

H  D{Ta)DiTi2)  ••• 

E  D(r,2)Z)(r.n) 

1  =  1 

1=1 

)=] 

m 

\  «=i 

m 

E  D{T,r.)DiT,,)  ... 

«=i 

m 

E  £>(r,„)z?(r.„) 

»=i  / 

^  D{Tn)  DiT2^)  •••  D{T^,)^ 
D{Tr2)  D{T22)  •••  DiTrr.,) 


[D{Ti„)  D{T2„)  •••  DiTrr^r.) 

\ 


T>(rn)  D{Ti2)  •••  D(r,„) 

D{Tn)  D{T22)  •••  D{T:,n) 

\  D{Tm^)  D{Tm2)  •••  D{Tmn)  ) 


^z(T) 


r  =  0 
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We  have  already  seen  that  this  is  not 


-D'^{T)D{T)^ziT) 


I  r  =  o 

This  is  why  we  need  the  suggested  definition.  When  used,  we  get  S{Z^Z}  = 


(  - 

m 

-  N 

E  D,{Ti,Ta) 
1=1 

ED,{TnTi2)  ••• 
«=1 

E  D,{TaTi„) 

t=l 

m 

E  D.{Ti2Tii) 

m 

E^*(M2)  ••• 

m 

E  T>.(r.2T.n) 

i=l 

t=l 

t=l 

$z(T) 

m 

E  T>.(T.nr.i) 

\  «=i 

m 

ED.iTinTa)  ••• 
«=1 

m 

S  A(ri„rj„)  , 
1=1  / 

I  T  =  0 

Extrapolating  this  concept, 


€{Z”Z}  =  -D,{T^T)^z{T) 


I  r  =  0 

when  these  exist.  When  Z  is  a  square  matrix,  then 


£{Z^}  =  -D^{T^)<bz{T) 


r  =  0 
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Further, 

e{Z^}  =  {-ifD.{T<‘)^z{T) 

r  =  o 

Example  1  Expectation  of  AZ 

Let  A  —  A^  be  a  matrix  of  complex  constants,  and  let  Z  be  a  complex 
random  matrix  variable  with  characteristic  function  Then  A  =  B 

and  Z"AZ  =  Y^Y  where  Y  =  BZ. 

s{z^AZ]  =  e{Y^Y}  =  -z)(r"r)$y(r) 

T  =  0 

=  -D{T”T)^bz{T)  =  -D{T^T)^z[B^T) 

r=o  r=o 

Similarly,  ZAZ^  =  XX^  where  X  =  ZC  and  A  =  CC‘^ .  Then 

E{ZAZ^}  =  e{XX^}  =  -D(Tr")$x(7’) 

r  =  o 

=  -D{TT^)^zc{T)  =  -D{TT^)<ifz{TC^) 


7  =  0 


7  =  0 
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In  a  similar  manner,  for  =  A  =  >  0  then 

£{Z'^AZ}  =  -DiT'^T)^z{B^T) 

T  =  0 

Notice  that  it  is  still  the  Hermitian  transpose  of  the  square  root  of  A  that 
appears  as  a  factor  in  the  transform  variable  matrix  even  though  we  are  dealing 
with  the  characteristic  function  of  a  symmetric  matrix  variable.  As  review, 
this  is  a  result  of  theorem  18. 

Likewise,  for  A^  =  A  =  CC^  >  0  then 

£{ZAZ'^}  =  -D{TT'^)^z{TC^) 

r  =  o 

Note  that  this  general  technique  is  not  applicable  for  computing  E{ZAZ} 
for  A  >  0  with  no  other  restrictions  on  Z  and  A.  We  observe  that  A  >  0  implies 
there  is  a  “square  root”  decomposition  A  =  CC.  Then  ZAZ  =  ZCCZ  =  XY 
where  X  =  ZC  and  Y  =  CZ.  There  is  not  a  simple  relation  on  ^z{T)  that 
yields  £{XY}  with  this  approach. 

\iZ  =  Z"  and  A  =  A"  then 

e{ZAZ)  =  e{ZAz^}  =  e{z”AZ) 


which  were  given  above. 
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Example  2  Cauchy  Distribution 

The  Cauchy  distribution  provides  a  simple  example  of  a  distribution  that 
does  not  have  a  “well-defined”  mean.  Here,  it  is  an  example  that  existence  of  a 
characteristic  function  does  not  imply  existence  of  moments.  The  probability 
density  function  of  the  Cauchy  distribution  is  given  by 

fix)  =  -( - - — —  1  dx,  xeR 

Its  characteristic  function  is 

Perform  the  change  of  variables  y  =  x  —  6.  Then  x  =  y  ■\-0  and  dx  =  dy. 


In  order  to  apply  Gradshteyn  and  Ryshik  equation  (3.354.5)  [94],  let  z  =  —y. 
Then  y  =  —z  and  dy  =  —dz.  Then 


By  equation  (3.354.5),  this  is 

TT  1 

When  moments  exist,  they  are  found  by  differentiating  the  characteristic  func¬ 
tion  and  evaluating  at  t  =  0. 
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Note  that 

-1,  t<0 

1  t>0 

undefined,  t  =  0 

Thus 

r  =  o 

is  undefined,  and  S{x}  does  not  exist. 


B.4.4  Uncharacteristic  Functions  for  a  Moment 

All  of  the  results  in  this  section  were  supplied  by  me.  I  have  not  diligently 
searched  the  literature  for  these  results. 

Uncharacteristic  Function  A 

Define  function  ^z(T)  to  be  a  function  that  maps  {Z,T)  C  where  Z,T  E 
(jmxn  matrices  and  all  elements  of  T  are  algebraically  independent.  Let 
Z  be  a  matrix  complex  random  variable  with  distribution  function  dF{Z). 
Finally,  let 


^z{T)  =  f{exp[ar(r’’Z)]}  =  /  exp[ar(r^Z)]dF(Z) 

J  Z 
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When  we  expand  the  definition,  we  get 

m  n 

tz{T)  =  £{eMiY.'£.T,kZiu]) 

j=l  k=l 

If  we  take  the  derivative  we  obtain 

j  j  m  n 

J  m  n 

“^P9  i=l/t=l 

where  we  assumed  it  is  legal  to  interchange  the  derivative  and  the  integral. 
Applying  the  derivative  we  obtain 

j  m  n 

— -tzCT)  =  i{tz„exp|i  E  £ 

“-‘P9  J=1A=1 

When  we  evaluate  this  expression  at  T  =  0,  then 

-^■9z(T)  =iS{Z„} 

w  J  pq 

r  =  0 

or  solving  for  the  first  moment, 

nz„)  =  -’~fzi.T) 

dTpg 

T  =  0 

Extending  this  to  the  derivative  with  respect  to  a  matrix,  we  obtain 

^'iz(T)=  =.£(Z) 

T=0 
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or 

£{Z)  =  -i  = 

T=0 

This  is  a  property  we  seek  in  usual  work  with  characteristic  functions,  in  that 
here  we  are  using  a  true  derivative  rather  than  a  special  differential  operator. 
Now,  let  Z,  r  G  A/„(C).  We  need  the  property  that 


<T,Z  >=<  Z,T  >* 
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Uncharacteristic  Function  B 

Define  Qz{T)  :  {Z,T)  £  C"*^”  C  by  (IziT)  =  £'{exp[i  tr(r^Z)]}.  Then  we 
get 

m  n 

(lAT)  = 

j=\  k=i 

Recall  that  does  not  exist  anywhere.  To  obtain  our  moments  we  must 

look  at  and  its  matrix  extension  Then  we  get 

f)  ” 

^nz(r)  =  f  {iZp,  exp|i  5; 

J=lt=l 

and 

^(iz{T)  =rS{Z„] 

T  =  0 

For  G  C,  we  note  that  X  =  Q  implies  X*  =  0.  We  thus  get  the  nearly 
familiar  result 

f{Z‘)  =  (-i)‘ ( J;;)  nz(T) 

T  =  0 

Slz{T)  may  have  some  nice  properties  because  tr(r^Z)  defines  an  inner  prod¬ 
uct  <  T,  Z  >  .  We  verify  this. 

1. 

m  n  /  m  n  \  * 

<r,z>=2Er,^Zi»= 

i=i  k=\  V=i  k=i  / 


=<  Z,T> 
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2. 

m  n  m  n  m  n 

<t,z  +  x>='£E  T;dz,k  +  A',.)  =  ’£'£  t:a  +  21  E 

j=l  k=\  j=l  it=l  J=1  fc=l 

=<T,Z  >  +  <T,  X  > 

where  also  X  6 

3. 

m  n 

<  T,aZ  >=  ^a<T,Z> 

j=i  fc=i 

4. 

m  n  m  n 

<T,T  >=■£'£  =  2:  E  IT,,  I'  >  0 

j=l  /!=1  j=l  /:=1 

for  all  r  G 

5. 

<T,T>=0 

if  and  only  if  T"  =  0. 

Uncharacteristic  Function  C 

Let  T,Y,Z  G  and  q  G  C.  Let  g{Z)  be  a  linear  function.  Thus 

g{Y  +  aA)  =  g(Y)^ag{Z) 

Let  <  r,  Z  >  be  an  inner  product  on  the  set  of  matrices  in  Define 


vz{T)  =  £:{exp(ii/(<  T,Z  >)]} 
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Let  us  consider  the  properties  of  uz{T). 

I'aziT)  =  £'{exp[ifli(<  T,aZ  >)]}  =  f’jexpfrXa  <T,Z  >)]) 

=  £’{exp[io5f(<  T,Z  >)]} 

Also 

I'oziT)  =  £'{exp[2^(<  a‘T,Z  >)]}  =  uzioTT) 

Let  A  e  B  e  CP^^  C  e  Ym^r  =  AZB  +  C,  T  6  c- 

Z  e  C”^P.  Then 

uy(T)  =  VAZB+dT)  =  £{exp[i5(<  T,  AZB  +  C  >)]} 

=  £:{exp[i5(<  T,AZB  >  +  <T,C  >)]} 

=  5{exp[i^(<  T,AZB  >)  +  gi<T,C  >)]} 

=  ^{exp[i5(<  T,AZB  >)]expl^g(<  T,C  >)]} 

Since  C  is  a  matrix  of  constants,  we  can  write  this  as 

£:{exp[i^(<  T,AZB  >)]}exp[i5(<  T,C  >)] 

When  A^  is  the  adjoint  of  A  we  get 

£’{exp[?5'(<  A^T,ZB  >)]} exp[zir(<  T,C  >)] 

=  i'ZBiA"T)exp\ig(<  T,C  >)] 

When 


and 


^(<X,V'>)  =  tr(X''V) 
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for  conformable  and  Y  then 


lyyiT)  =  e{exp[iix{BT^AZ)\}ex^[iir{T”C)]  =  vz{A”TB”)ex^[itx{T^C)] 


which  we  established  earlier  in  a  similar  form  for  g{<  X,Y  >)  =  Re[tr(X^y)]. 

Partition  Z  =  {Zi,  Z2)  where  is  n  x  pi.  Similarly,  let  T  =  (Ti,  T2)  where 
7i  is  n  X  pi.  Then 


vz{Ti,0)  =  f{exp[i5f(<  T,Z  >)]}  =  £’{exp[i5r(<  (Ti,  0),  (Zi,  Z2)  >)]} 


Let  <X,Y  >=  h{X»Y).  Then 


v>z{Ti,Q)  =  e{exp[ig{h{  ,(^1,^2)})]} 

,  0  , 


r"z,  T»Z2 

=  £:{exp[e>(/i<  >)]} 

[0  0  J 

If  h{X)  is  a  function  of  only  square  submatrices  on  the  main  diagonal  of  X 


then  we  have 
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i^z{Ti,0)  =  S{exp[ig{h>  *  •)]} 

0  0 


If  h{X)  =  tr(X)  then  i^ziTuO)  = 


S{exp[igitT{T(^ Zi))]}  =  5{exp(ip(<  T^Zi  >)]}  =  i^zATi) 


Now  partition  Z 


where  Zi  and  Ti  are  ni  x  p, 


/  \ 

{  \ 

(  \ 

Ti 

Ti 

Z\ 

VZ 

k  0  ) 

=  £{exp[iflr(< 

^  0  > 

K  -^2  , 

»]} 


Let  <  X,Y  >=  h(X^Y).  Then  we  have 


(  \ 

(  r,  \ 

Ti 

=  £{exp[ig{h[{Tl^,0), 

Zi 

Vz 

K  0  > 

1  ^ 

])]]  =  £{exp[ig{h{Tl^Zr))]} 

=  £{exp[iflf(<  TiZi  >)]}  =  jyziiTi) 

Let  Z  =  and  T  =  {TuT2).  Then 

uz{T)  =  £{exp[i5r(<  T,Z  >)]}  —  €{Gxp[ig{h[{Ti,T2)^ {Zi,  Z2)])]] 


=  £{exp[ig{h 


^  T^Zi  T»Z2  ^ 


)]} 


y  T^Z,  T»Z2  j 

where  <T,Z  >=  h{T^Z).  When  ft  is  a  function  of  only  square  submatrices 
on  the  main  diagonal,  this  is 


uz{T)  =  5{exp[iif(ft 


/  „  \ 
T^Zi  0 


0  T^Z2  j 


)]} 


When  h{X)  =  tr(X)  then 

I'ziT)  =  e{exp[ig{tt{T,»Z,)  + 


=  5{exp[tfli(tr(r/^Zi))  -f  ig{iv{T2  Z2))]} 
=  5{exp[z5(tr(rj^Zi))]exp[ifif(tr(r2^  ‘^2))]} 
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When  Z\  and  Z2  are  independent  then 

uz[T)  =  5{exp[i^(tr(rf  Z,))]}5{exp[i^(tr{rfZ2))]} 

=  vzi{Ti)i/z2{T2)  =  i'z{Ti,0)t/z{0,T2) 

Similarly,  for  Z  =  |  |  ,  T  =  j  j  ,  and  Zi  independent  of  Z2  we 

obtain 


(  \ 

(  \ 

Zx 

,7’  = 

Ti 

\  ) 

(  \ 

(  \ 

Tx 

0 

vz 

Vz 

.  0  y 

Consider  Z^  and  Z^,  where  both  are  in  and  hence  Z  and  Z*  are  in 

Cmxn  Then 

i/^r(r)  =  £:{exp[t^(<  >)]} 

When  <  X,y  >=  /i(tr(X^y))  then 

vzt{T)  =  5{exp[z5'(A[tr(r"Z^)])]}  =  S{exY>[ig{h[iT{ZT*f])]} 

=  £:{exp[isr(fe[tr(Zr*)])]} 

since  tr  =  tr  X. 

u^r(T)  =  f{explis(/iltr(rZ)|)])  =  ^z(T^) 

Similarly, 

-z,(T)  =  f{exp|i<,(/.Itr(r''Z'')|)|)  =  f(exp[ts(/iItr(ZT-)’'|)|} 


=  f{exp|ij(A[tr(r-Z-)))]}  =  ^z.(r^) 
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Let  Z  =■  X  +  iY  and  T  =  R  +  iS.  Then 

i/ziT)  =  £^{exp[2^(<  R  +  iSX  +  iY  >)]} 

=  5{exp[i^(<  R  +  iSX  >  -hi  <  R  +  iS,  Y  >)]} 

=  £{exp[i5'(<  R,X>—i<  SX  >  -{-i  <  R,Y  >  -\-  <  S,Y  >)]} 

=  £:{exp[i{5f(<  R,X  >)+g(<  SY  >)}  +  {g{<  S\X  >)  -  g{<  >)}]} 

Similarly, 

i/^.(r)  =  £{exp[i5(<  R  +  iSX  -  iY  >)]} 

=  £{exp[i{fl((<  R,X  >)-  g{<  Sy  >)} 

+{gi<R,Y>)  +  g{<S,X>)}]} 

and 

i/J(r)  =  £{exp[-i{^(<  R,X  >)+g{<  SY  >)} 
+W<5,A->)-!,(<fi,K>))l} 

If  Re(Z)  and  Im(Z)  are  independent,  then 

>'ziT)  =  i^xiTMT) 

and 


i^z*{T)  =  vx{T)l/^_i)Y{T)  =  vx{T)vY{iT) 


Appendix  C 
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COMPLEX  CHANGE  OF  VARIABLES 

C.l  Introduction  to  Changing  Variables  for 
the  Complex  Case 

The  reason  for  the  existence  of  this  chapter  is  to  develop  thosj  Jacobians 
needed  for  changes  of  complex  variables  required  for  distributional  results  of 
this  thesis.  The  theory  for  change  of  variables  has  long  been  worked  out,  but 
the  specific  forms  required  for  application  for  multivariate  statistics  have  not 
been  systematically  worked  out  for  the  complex  variables  case.  Only  isolated 
results  appear  in  the  literature,  and  I  have  not  found  some  results  needed  for 
this  thesis. 

There  are  several  issues  that  have  arisen  in  this  thesis  related  to  this  topic. 
The  first  is  a  need  to  recognize  the  difference  between  a  mapping  and  a  change 
of  variables.  We  have  unfortunately  created  confusion  by  the  ambiguity  of  the 
American  language  by  referring  to  both  situations  with  the  same  terminology, 
whether  we  speak  of  transformations,  mappings,  or  changing  variables.  At 
the  abstract  level,  the  b2isic  difference  is  whether  or  not  you  are  changing  the 
measure  involved.  These  must,  in  turn,  be  distinguished  from  mere  renaming 
of  variables,  which  is  trivial  and  not  discussed  further.  The  picky  reader  can 
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consider  renaming  to  be  a  trivial  change  of  variables.  The  second  issue  is 
that  changing  variables  in  the  case  of  the  complex  space  C”  is  like  changing 
variables  in  the  case  of  the  real  space  R^”  with  a  need  to  pay  special  attention 
to  the  imposed  algebraic  structure.  The  third  issue  is  the  lack  of  results  in 
the  literature  that  apply  to  this  thesis,  or  the  statistics  of  complex  variables 
in  general. 

There  is  a  fourth  issue  which  will  not  be  dwelled  on  in  this  thesis,  but 
it  is  important  when  reading  the  literature.  When  comparing  results  in  the 
literature,  very  close  attention  must  be  paid  to  what  assumptions  are  being 
made.  It  is  not  uncommon  for  C"  to  be  treated  like  R^”  with  the  results 
expressed  in  terms  of  R^",  but  discussed  as  if  they  are  expressed  in  terms 
of  C".  This  point  is  also  made  in  the  work  on  complex  differentiation.  It  is 
important  enough  to  be  said  twice. 

The  first  issue  is  really  not  a  trivial  issue.  If  you  apply  a  transformation  to  a 
variable,  are  you  engaged  in  changing  variables  or  are  you  merely  picking  a  new 
point  in  space  at  which  to  evaluate  your  function?  If  you  are  interested  in  the 
invariance  properties  of  a  particular  function  or  measure,  you  want  to  examine 
or  describe  the  effect  of  that  measure  as  you  move  about  among  your  various 
measurable  sets.  You  could  properly  want  to  know  the  relationship  between 
different  points  or  subsets  in  a  space  for  which  the  measure  is  invariant.  A 
rule  used  to  choose  one  point  in  a  space  when  you  are  given  another  point  in 
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a  space  is  a  transformation  or  function  that  does  a  mapping,  but  it  is  not  a 
change  of  variables  if  your  object  is  to  retain  the  same  measure  and  observe 
its  performance  as  the  data  changes. 

A  transformation  is  a  change  of  variables  situation  when  the  intent  is  to 
change  the  measure  being  used.  The  usual  situation  is  that  you  want  to  rescale 
your  data  to  make  it  conceptually  or  mathematically  easier  to  work  with  or 
explain,  but  you  want  resulting  integrations  done  with  the  new  scaling  to 
provide  the  same  answer  as  the  integrations  over  the  same  set  made  with  the 
previous  scaling.  A  more  generalized  concept  is  to  allow  the  outcome  of  the 
integration  to  change  by  a  known  function,  not  necessarily  constant,  but  I 
have  not  seen  that  discussed  in  any  texts. 

For  the  second  point,  it  is  not  a  surprise,  but  it  is  important,  that  you  must 
be  diligent  to  consider  both  the  imaginary  and  real  parts  of  a  complex  variable. 
Where  differences  show  up  in  applications  is  that  of  structure  in  multivariate 
data.  You  already  know  to  take  into  account  the  effects  of  repeated  block 
structure,  bandedness,  etc.  There  is  one  more  component  to  the  structure 
of  the  variable  to  consider,  the  effects  of  conjugation,  summarized  in  table 
C.l.  Structure  is  important.  The  theory  for  zonal  polynomials  and  group 
representation  theory  as  applied  to  complex  variables  has  been  done  by  others 
for  the  structure  of  the  complex  symmetric  case  but  not  the  Hermitian  case. 
Only  Gross  and  Richards  [96]  have  addressed  the  complex  Hermitian  case. 
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Table  C.l.  Structure  in  Complex  Variables 


Matrix  Type 

Meaning 

Diagonal  Elements 

Restrictions 

symmetric 

Z  =  Z^ 

X  +  iy  =  X  +  iy 

No  restrictions 

Hermitian 

Z  =  Z^ 

X  -{■  iy  =  X  —  iy 

Zii  G  R 

skew-symmetric 

Z  = 

X  +  iy  =  —X  —  iy 

z..  =  0 

skew-Hermitian 

Z  =  -Z» 

X  —  iy  =  —X  -b  iy 

Zii  €  C\R  (imaginary) 

The  third  point  is  that  the  usual  discussions  about  changes  of  variables 
do  not  address  the  implementation  of  the  principles  to  complex  variables. 
The  mathematician  would  say  that  it  is  not  necessary  because  the  theory 
has  been  worked  out  for  more  general  spaces.  The  engineer  rarely  studies 
those  more  general  spaces,  and  he  has  a  daily  need  to  work  in  the  complex 
field.  This  chapter  archives  a  systematic  development  which  includes  results 
specific  to  this  thesis  and  is  useful  for  other  future  complex  variable  work. 
Perhaps  this  could  be  called  the  engineering  of  mathematics,  if  engineering  is 
the  development  of  methods  and  tools  for  translating  theory  into  application. 

A  nearly  novel  feature  of  this  chapter  to  engineers  is  the  application  of 
exterior  products.  Exterior  products  go  by  several  names,  depending  on  the 
discipline  of  the  individual  doing  the  discussion.  Another  popular  name  for 
them  is  wedge  products.  Wedge  products  greatly  simplify  the  computation  of 


Jacobians  for  nonlinear  changes  of  variables.  The  root  of  using  wedge  prod¬ 
ucts  for  change  of  variables  problems  is  found  in  differential  geometry.  Rudin 
(pp.  253-266)  [229]  provides  a  nice  introduction.  Muirhead  [187]  uses  exterior 
products  in  developing  results  for  the  real  variables  case.  I  do  not  know  of  any 
reference  that  illustrates  the  procedure  for  application  of  exterior  products  to 
the  complex  variable  case.  However,  physicists  and  nuclear  engineers  are  likely 
to  know  of  such  a  reference  because  they  must  deal  with  changes  of  variables 
for  tensors. 

C.1.1  Univariate  Real  Change  of  Variables 

From  basic  calculus  we  recall  the  technique  for  change  of  variables.  Let  ^(x)  be 
some  function  of  the  variable  x.  Let  y  =  g{x)  be  a  one-to-one  transformation 
of  X  to  y  which  is  valid  over  some  set  of  x  €  A  and  y  £  B.  So,  g  :  A  B. 
Let  the  inverse  transformation  be  given  by  x  =  f{y).  Let  exist  and  be 
continuous  over  B  and  take  on  non-zero  values  somewhere  in  B.  Then  the 
function  <f{y)  resulting  from  the  change  of  variables  is  given  by 

^{y)  =  h[fiy)] \J{x  y)l 

where  |J(x  — ►  i/)|  is  the  absolute  value  of  the  determinant  of  the  Jacobian 
matrix  (univariate,  in  this  case)  given  by 
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C.1.2  Traditional  Multivariate  Change  of  Variables 


Bendat  and  Piersol  (p.  59)  [38]  discuss  the  change  of  variables  involving  multi¬ 
valued  functions  of  nice  functions,  like  sin(a;)  and  The  probability  of  having 
arisen  from  one  pre-image  set  is  identical  to  the  probability  of  having  arisen 
from  any  other  pre- image  set.  Their  discussion  is  a  restricted  case  of  a  more 
general  treatment  given  here.  One  of  the  nicest  treatments  of  the  change  of 
variables  technique  is  given  by  Hogg  and  Craig  (pp.  147-152)  [109].  This 
introduction  is  taken  from  their  pages  151-152  with  only  a  few  notational 
changes.  It  is  important  enough  and  short  enough  to  be  included  here  rather 
than  merely  just  referenced. 

Let  X  be  an  n-dimensional  random  variable,  and  let  ifix)  be  the  joint 
probability  density  function  of  x.  Let  A  be  the  n-dimensional  space  where 
ip{x)  >  0  and  consider  the  transformation  y  =  u{x)  which  maps  A  onto  B  in 
the  n-dimensional  space  of  T.  To  each  point  in  A  there  corresponds  only  one 
point  in  B.  However,  a  particular  point  in  B  may  correspond  to  more  than 
one  point  in  A.  Thus,  the  transformation  might  not  be  one-to-one. 

Suppose  that  we  can  represent  A  as  the  union  of  a  finite  number  of  disjoint 
sets  {i4,}  so  that  y  =  «(x)  defines  a  one-to-one  transformation  of  each  A,  onto 
B.  Thus,  to  each  point  in  B  there  will  correspond  exactly  one  point  in  each 
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of  the  Af.  Let  X  —  Wi{y)  denote  the  inverse  functions  that  map  B  to  y4,.  Let 


dwA 

dy  ) 


where  each  is  continuous  on  B  and  the  Jacobian  Ji{x  —^y)  =  det  be 

nonzero  somewhere  on  B.  From  a  consideration  of  the  probability  of  the  union 
of  k  mutually  exclusive  events  and  applying  the  change  of  variable  technique  to 
the  probability  of  each  of  these  events,  it  can  be  seen  that  the  joint  probability 
density  function  of  y  is  given  by 

I  y€B 

Hy)  =  { 

[  0  y  ^  B 

Example  of  y=x^  on  a  Shaped  Sample  Space 
Blame  me  for  this  example. 

Consider  the  sample  space  Q  =(— 2,  — 1,0, 1, 2, 3, 4, 5, 6)  and  the  transfor¬ 
mation  y  =  We  want  to  determine  the  new  density  function  V’(y)  when 
we  are  given  the  original  sample  space  and  density  function  ip{x).  The  de¬ 
tails  are  given  in  table  C.2.  The  point  to  observe  here  is  that  the  points 
(—3,  —4,  —5,  —6)  are  not  in  the  pre-image  of  ip{y).  To  solve  the  problem,  the 
image  and  pre-image  sets  must  be  partitioned  such  that  within  a  given  parti¬ 
tion  it  is  possible  to  define  a  one-to-one  transformation  of  variables.  For  the 


am,. 

awu 

...  ^ 

dyi 

ay2 

ayn 

am. 

am. 

...  dm. 

ayi 

ay2 

dyn 

aww 

...  dm„ 

ayi 

ay2 

ayn 
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Table  C.2.  Variables  on  a  Shaped  Sample  Space 

n  -2  -1  0123  4  5  6 

y  =  4  1  0  1  4  9  16  25  36 

+Vx2  2  1  0  1  2  3  4  5  6 

-v^a  -2  -1  0  -1  -2  -3*  -4*  -5*  -6* 

example  just  given,  a  set  of  partitions  would  look  like  figure  C.l.  Although  a 
minimal  set  of  partitions  may  exist,  it  is  not  necessary  to  use  or  even  find  it. 

To  generalize  slightly,  consider  the  continuous  real  random  variable  x  € 
[—2,6]  having  a  probability  density  function  of  Change  variables  with 
the  transformation  y  =  x^.  We  want  to  find  the  new  density  function  0(y). 
Then  the  inverse  transformations  are  given  by 

^liy)  =  +y/y  0  <  y  <  4 
^2iy)  =  -y/y  0  <  y  <  4 
W^3(l/)  =  +y/y  4  <  y  <  36 

The  magnitude  of  the  Jacobian  for  this  transformation  is  computed  as  follows. 


Figure  C.l.  Partitioning  of  Domain  and  Range  for  Multivalued  Transformation 


The  new  density  function  is  given  by 


\Ji{^  -*  y)\  +  9lW^2(j/)] \J2{^  y)| 


Hy)  =  •( 


^[^3{y)]  \M^  -*  j/)l 


C.2  Exterior  (Wedge)  Products 


for  0  <  y  <  4 

for  4  <  y  <  36 


Exterior  products  are  also  known  as  wedge  products  because  of  the  shape  of  the 
operator  used  to  denote  them,  A  •  Exterior  products  are  most  often  studied  by 
engineers  and  physicists  when  working  with  tensor  calculus.  The  theory  about 
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exterior  products  is  grounded  in  k-forms  and  differential  geometry.  I  will  hide 
my  ignorance  of  those  areas  by  not  explaining  the  theory  of  exterior  products 
to  you.  Muirhead’s  text  (pp.  50-57)  [187]  is  a  nice  reference  for  those  whose  use 
of  exterior  products  is  limited  to  the  need  to  compute  complicated  Jacobians. 
Another  nice  reference  is  Rudin’s  undergraduate  text  [229]  on  analysis.  The 
more  adventurous  can  consult  Spivak’s  popular  short  book  [254]  on  calculus  on 
manifolds,  and  the  truly  bold  can  consult  Spivak’s  more  comprehensive  work 
[255]  on  differential  geometry.  The  reason  to  use  exterior  products  is  to  make 
difficult  Jacobians  much  easier  to  compute.  While  the  use  of  exterior  products 
is  nice  in  the  real  variables  case,  their  use  for  all  but  the  simplest  Jacobians  is 
nearly  mandatory  in  the  complex  variables  case. 

An  exterior  product  /\  is  an  operator  that  maps  a  pair  of  differentials  dx 
and  dy  into  R  and  has  the  properties  listed  below: 

dx  /\dy  =  —dy  A  dx 
dx  /\dx  =  0 

dx  A  oidy  =  adx  A  dy,  for  constant  q 
dx  A(<fj/  A  dz)  =  {dx  A  dy)  A  dy 
dx  f\{dy  +  dz)  =  [dx  A  dy)  -f  [dx  A  dz) 


The  remainder  of  this  section  was  supplied  by  me.  The  first  property  is  use¬ 
ful  for  combining  cross-product  terms  between  real  and  imaginary  parts  of 
differentials  when  working  with  complex  variables.  The  second  property  will 
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reduce  our  work  on  matrices  that  have  some  structure  that  repeats  the  use  of 
any  particular  variable  or  its  complex  conjugate. 

It  is  very  useful  to  always  consider  a  probability  density  function  as  a 
differential.  As  a  reminder,  it  is  useful  to  always  write  as  a  part  of  the  density 
function  the  differential  we  are  using.  For  example,  instead  of  writing  (p{x), 
we  are  less  likely  to  make  conceptual  mistakes  by  writing  ^p{x)  dx.  Let  x  be  a 
complex  variable.  When  viewed  in  R^,  index  the  real  part  with  R,  and  the 
imaginary  part  with  I.  Then  we  can  write  dx  as  Idxndxj]  where  we  take 
the  magnitude  or  absolute  value  since  our  interest  is  in  scaling  of  differential 
volumes.  In  terms  of  the  exterior  product,  we  write  dx  ~  \dxft^dxi\ . 

Let  us  examine  the  differential  dx  /\dy  =  (dx  /\dy)R  +  i(dx  /(dy)j. 

dx/\dy  =  (dxR  +  idxi)  /\{dyR  +  idyi) 

=  dxR  /\  dyR  +  dxR  f\(idyi)  +  (idxi)  f\ dyR  +  (idxi)  /\{idyi) 

=  dxR  f\ dyR  +  idxR  /\  dyj  +  idxj  f\ dyR  +  i^dxj  f\ dyj 
=  dxR  f\ dyR  -  dxj  f\ dyi  +  idxR  f\ dyi  +  idxi  f\ dyR 

Therefore 

(dx  f\ dy)R  =  dxR  f\ dyR  -  dxi  f\ dyj 
(dx  f\  ill/] I  ^  dxR  /\ dyi  +  dxi  /\  dyR 

Now,  let  the  differential  <i;,  «■  tl  r  conjugate  of  dx.  So,  dx  =  dxR  +  idxi  and 


dx"  =  (dxR  +  idxiY  =  dxR  —  idxi 
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This  leads  to  dyi  =  —dxi. 

dx  f\dx*  —  (dxR  +  idxi)  ^{dxR  —  idx[) 

=dxR  /\  dxR  +dxR  /\{-idxi)  +  (idxj)  /\  dxR  +  {idxi)  f\{-idxi) 

' - ' 

0 

=  —idxR  f\  dxi  4-  idxj  dxR  —  dxj  dxj 

' - ' 

0 

=  —i2dxRf\dxi 

Therefore  (dx  f\dx*)R  =  0  and 

(dx ^dx*)i  =  —2(dxR  f\dxi) 

Notice  that 

dx  f\ dx*  =  —dx*  f\ dx 

C.2.1  Example  of  Wedge  Products  in  Rectangular  to 
Polar  Coordinate  Change  of  Variables 

This  was  supplied  by  me.  I  expect  that  a  senior  in  physics  or  nuclear  engi¬ 
neering  could  easily  produce  this  example. 

Let  z  =  X  +  iy  =  re’®  and  let  f(z)  =  f(x,y).  We  want  to  change  variables 
from  (x,y)  to  (r,6).  Find  the  Jacobian  of  the  transformation.  In  other  words, 
find  |J|  where 

dx  f\dy  =  I  J|  dr  f\ dO 
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The  result  is  the  familiar  dxdy  =  r  dr  dO.  This  result  is  often  developed  in  un¬ 
dergraduate  calculus  courses  without  the  benefit  of  exterior  products.  Notice 
how  much  simpler  the  development  is  here.  With  this  approach  there  is  no 
magic  or  serendipitous  hindsight  required. 

We  begin  with  \zf  =  x'^  +  y^  =  r'^,  and  from 


X  +  iy  =  r(cos  0  -t-  z  sin  9) 


we  see  x  —  r  cos  6  and  y  =  r  sin  6.  Thus 


tan^  = 


sin  9  y 
cos  9  X 


We  now  take  the  differentials. 


dtan^  =  ( — — ^  d9  =  -dy  — %dx  =  — -  (-dx  —  t/y') 

\cos^9/  X  x^  X  \x  J 

2rdr  =  2xdx  +  2ydy  dr  =  {x^  -t-  y^)~^^^{xdx  +  ydy) 
Taking  the  exterior  (wedge)  product,  we  see 

2rdr  l\  d9  =  -^^dr /\d9 

'  \cos^9)  cos ^9 

=  (2xdx  +  2ydy)  f\ 

=  --{xdx  -I-  ydy)  f\{^dx  -  dy^ 

X  \  X  / 

2 

= - y  dx  f\dx  —xdx  f\dy  +  ~  dy  f\dx  —y  dy  ^ dy 

^  S  — N./  ■  ✓  ^  S  -1.^  ✓  ^ 

0  —dx^dy  0 

= --  -X-—  dx /\  dy  = -^  [x^ -I- y^]  dx /\  dy 
XX  X 
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Thus 

dxf\dy  =  '^  (dr  /\  dO) 

Recall  that  cos^  d  =  ^,  which  means 

a;2  J  /^2\ 

dx  f\dy  =  —  ~2r  l  —  \  dr  f\d0  =  rdrdO. 

^  T*  V  ^  / 

Therefore  |J|  =  r  and  dxdy  =  rdrdO. 

C.3  Jacobians  for  Complex  Change  of  Vari¬ 
ables 

In  the  section  we  develop  those  Jacobians  that  do  not  require  use  of  exterior 
products.  In  the  next  section  we  will  treat  those  which  do  require  exterior 
products.  We  begin  with  the  most  basic  differential  of  a  product  of  two  ma¬ 
trices. 

Theorem  21  Let  X  G  and  Y  G  Then 

d{XY)  =  X[dY]  +  [dX]Y 

and 

n—\ 

{dX^)  =  X]  ^ =nX'^-\dX) 

k=0 

when  n  =  m.  The  first  result  is  a  complexification  of  the  statement  of  Muir- 
head’s  problem  2.1  [187],  given  without  proof.  It  is  used  by  Srivastava  [256] 


I 


in  his  derivation  of  the  density  for  the  complex  Wishart  distribution.  /  do  not 
have  a  record  of  the  pedigree  of  the  second  result. 


Proof. 


XY  = 


Xn  •••  Xlm 


X, 


nl  *  *  *  -^nm 


Fa  r, 


Ip 


Kni  •••  K 


mp 


\fe=i  / 


LV*:=1 

(p(dXi,)Yt,'j  +  ff; 


{dX)dY  +  X{dY) 


Thus 


n— 1 


(rfX”)  =  Y,X‘‘{dX)X^-^-'‘  =  nX^-\dX) 

fc=0 


□ 


Note  that  {dX)  is  a  scalar,  whereas  ^  is  an  {nq)  x  (mp)  matrix  when  Y 
IS  q  X  p  and  is  n  x  m,  and  dX  is  a  matrix  of  differentials  (dXij). 


Theorem  22  Let  x  and  y  both  be  column  vectors  in  C”,  and  let  B  E 
such  that  Re(i3"')  exists  and  let  =  A  E  C"^"  such  that  A  is  unstruc¬ 
tured.  Let  y  =  Ax  be  a  complex  linear  transformation  from  x  to  y.  Then 
|J(x  — ♦  y)|  =  |det(y4“')|^  =  |det(F)|^.  This  is  a  complexification  of  Muir- 
head's  theorem  2.1.1,  stated  in  a  slightly  different  form. 


Proof.  Muirhead  gave  a  proof  for  the  real-variables  case  which  used  exterior 
products.  The  proof  I  have  provided  follows  a  more  traditional  approach.  This 


392 


first  proof  of  a  Jacobian  will  dwell  more  on  basics  than  future  proofs.  It  is 
important  to  see  the  details  once. 

When  forming  the  Jacobian  of  a  change  of  variables  in  the  complex  case, 
recall  that  each  part,  the  real  part  and  the  imaginary  part,  of  the  complex 
variable  undergoes  a  change.  Suppose  we  have  some  function 

Pi(^l ?  ^25  ■  ■  ■  5  ^n)  ^ Ill  '  '  ‘  f  ^Rni  ^/n) 

where  the  subscripts  R  and  I  denote  the  real  and  imaginary  parts  of  the 
associated  complex  variable.  The  goal  is  to  find  a  function 

PyiUli  Vh  '  ‘  '  1  Vn)  —  PyiVRli  Vlli  '  '  ’  1  yRni  Vln) 

which  is  related  to  px  by  the  mappings 

VRI  9R1  ^ III  '  '  ‘  1  ^Rni  ^In) 

yil  9lli,^Rli  ^111  '  '  '  1  ^Rni  ^In) 

yRn  9Rni,^Rl  1  ^111  '  *  '  1  ^Rni  ^In) 
yin  ~  9ln{,^Rl  1  ^11 1  '  '  '  1  ^Rni  ^In) 

Let  the  inverse  mappings  be  given  by 

^Ri  =  /fli(ym,  yih"  1  yRni  yin) 

= //i(ym,y/i,- •  •  ,j/Hn,y/n) 
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XRn  =  fRn{yRl,yil,  -  •  ■  ,yRn,yin) 


Xin  =  flniyRl,  yili  •  •  •  ,  yRn,  J//n) 


If  the  partial  derivatives  of  /rj  and  fij  with  respect  to  each  of  the  yRk  and  yik 
exist,  then  the  Jacobian  is  the  determinant  of  the  matrix  given  by 


9/ri  9/ji 

dJii 

3»/i  3y/i 


Q/Rn  9/jn 
dym  Sj/Ri 

9/Rn  9/jn 

3»J1  9v/i 


^/ri  ^fn 
dyRn  9yRn 

9/ri  9//1 
9y/n  9y/n 


afin  a/fn 
aVRn  aj/Rn 

a/Rn  a//i 
ay/n  ay/„ 


Consider  the  inverse  transformation  given  by 


XRi  +  ixji 


Brw  +  iBiii  •  •  ■  Briji  +  iBiin  1  I  yRi  +  iyn 


XRn  +  ixjn 


^Rnl  "t"  iBifil  '  ’  *  Bfljin  “t"  iBjrin  j  \  URn  H"  iyin 


Then 


XRk  +  ixik  =  ^{BRkj  +  tBikj){yRj  +  iyij) 


=  ~  Bikjyij)  +  i{BikjyRj  + 

i=i 
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By  equating  the  real  parts  with  each  other,  and  doing  likewise  with  the  imag¬ 
inary  parts,  we  get 


XRk  =  E  {BRkjVRj  -  BlkjVlj) 
i=i 


xik  =  E  {BikjVRj  +  BRkjVij) 
j=i 


Thus  the  partial  derivatives  are  given  by 

/ 


dxnt  dxRi, 
^VRj  ^Vlk 

dxjk  dxjk 

L  ^VR]  3yij  J 


B 


Rkj 


-B 


\ 


Ikj 


Bikj  BRkj  j 

In  the  transformation,  it  is  the  absolute  value  of  the  Jacobian  which  we  need 
to  evaluate.  Knowing  this  relaxes  the  bookkeeping  required  while  reforming 
the  matrix  of  the  inverse  transformation.  We  seek  the  form  of 


\J\  = 


"((g 


det 


Bru 

Bin 

Br21 

5/21  •  • 

BryiI 

5/„l 

-Bin 

Bru 

-5/21 

Br2\  •  • 

■  -5/ni 

BruI 

Br\2 

Bin 

Br22 

5/22  •  • 

5h„2 

5/n2 

—Biu 

Bri2 

—  5/22 

Br22  •  • 

•  -5/n2 

5h„2 

BrIti 

BiXn 

5//2n 

5/2n  •  • 

■  Bryiji 

5/nn 

—  Blin 

BrIv. 

-5/2„ 

5R2n  •  • 

5/nn 

5/inn 

BRkj  Bikj 


in 


Notice  that  the  indexing  of  the  block  matrices  Bkj  — 

-B/kj  BRkj  J 

this  equation  for  the  Jacobian  is  the  transpose  of  the  matrix  B  when  written 


395 


W,= 


det 


:  form. 

By  exchanging 

rows, 

we  get 

Bru 

Bin 

Br31 

5/21 

5//„i 

Blnl 

Bri2 

Bm 

Br22 

5/22 

BRn2 

5/„2 

Briti 

Bixn 

BR2n 

5/2n 

•  •  •  BRfin 

Bjnn 

-Bln 

Bru 

—  5/21 

Br21 

•  •  ■  —Blnl 

5/j„i 

—Biu 

Br12 

—  5/22 

Br22 

•  •  '  -5/„2 

5/i„2 

-Blin 

BrIti 

—  5/2n 

5/J2n 

■  ■  '  Bjnn 

BRnn 

\ 

Notice  that  all  the  negative  elements  are  now  in  the  bottom  half  of  the  matrix. 
By  exchanging  columns,  we  get 

/ 


\J\  = 


det 


Bru 

Br2\ 

•  •  •  5//nl 

Bin 

5/21  • 

•  •  5/„i 

Br12 

Br22 

5//n2 

5/12 

5/22  • 

•  •  5/„2 

5/iln 

BR2n 

•  •  •  BRnn 

5/i„ 

5/2n  • 

'  ■  Binn 

—Bin  —5/21 

■  •  •  —Blnl 

Bru 

5/i21  ' 

5/Jnl 

—  5/12  —5/22 

■  •  •  — 5/„2 

Bri2 

Br22 

•  •  5//„2 

^  5/1  n  5/2n 

■  '  *  5/nn 

5/J1ti 

BR2n 

BRnn  y 
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Now  we  see  that  all  the  negative  elements  are  in  the  bottom  left  quadrant.  We 
have 


|J|  = 


det 


BS  BJ-' 
-BJ 


det 


‘  Bn  -Bi^ 


det 


^  Br  -Bi  ^ 


)\ 


\Bj  Br  ) 

since  the  determinant  of  a  matrix  is  equal  to  the  determinant  of  its  transpose. 
Using  lemma  45,  this  becomes  |  J|  =  |det{B;?]  detf^n  —  where 


we  need  Bj^  to  exist. 


\J\  =  |det[5fl]  det[Bfi  +  BiB^^Bi]\  =  \det[BR]  det[(/  +  BjB^^ BiB^^)Br]\ 

=  \idet[BR]fd^t[I  +  {BiBR^f] 

=  \(det[BR]}^det[{I-hiBrBR^}{l  -  iBiBR^)]\ 

=  |{det[/  +  iBiB-^^]  det[5H]}{det[/  -  IBiBr^]  det[BH]}| 

=  ({det[Bfl  +  iB/]}{det[Sfl  -  iBi\}\  =  l{det[^]}{det[S]}*|  =  ldet[5]|' 


We  also  know  that 


|det[B]l'  =  |{det[B]}{det[5]}*|  =  Kdet[B]}{det[B*]}l 
by  lemma  42,  which  becomes 

|{det[5B*]}l  =  {det[|Bf]} 

To  summarize,  given  a  complex  linear  transformation  y  =  Ax  where  each 
of  y,  y4,  and  x  are  complex,  with  inverse  transformation 


x  =  A  ^y  =  By  =  {Br  +  iBi)y 
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then  the  Jacobian  of  the  transformation  is  given  by  \J{x  — »  t/)|  =  |deti3|^  = 

|det>l|'^  □ 


Proposition  30  Let  x  =  By  be  a  change  of  complex  variables  such  that  x,y  E. 
C"  and  B  E  except  that  the  first  column  of  B  is  constrained  to  be  a 

column  of  all  ones.  Then  \  J{x  —*  j/)(  =  |det  B^  and  |y(j/  x)(  =  |det  B\~^  . 


Proof.  This  issue  arose  in  deriving  a  complex  version  of  Anderson  (pp. 
522-530)  [26]  theorem  13.2.2.  There  was  concern  over  the  possibility  of  the 
Jacobian  being  anomalously  zero  due  to  all  zero  entries  in  the  imaginary  part 
of  the  first  column  of  B.  The  important  message  of  this  proof  is  that  the 
concern  is  unfounded. 

Let 


/ 

/  \ 

f  \ 

Xi 

1  Bi2  •  •  •  B]n 

y\ 

X2 

1  B22  •  •  •  B2n 

J/2 

1 

•  1  ’  •  I 

I 

^  1  Bn2  '  *  '  Bfin  ^ 

^  > 

Then 


n  n 

XRi  -f-  ixii  =  J/Hi  +  ^{SRijyRj  -  Biijyij)  -1-  i  +  BRijyij) 

3=2  3=2 


where  the  subscripts  R  and  /  refer  to  the  real  and  imaginary  parts  of  their 
complex  variables.  The  Jacobian  matrix  for  the  change  of  variables  is  given 
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by 


/ 

dVR 

Syi 

\ 

dxR 

dxR 

dVR 

dyi 

1 

dxj 

dx] 

/ 

1  Bri2  ■  ■ 

BRin  0 

—  Bii2  ■  ■  • 

-Bn, 

1  Br22  •  •  * 

BR2n  0 

—  Bi22  •  •  • 

—  Bi2, 

BRn2  •  •  • 

BRnn 

0 

—  Bln2  ■  ■ 

^/nn 

Bi\2  ■  ■  • 

Blln 

1 

Bri2 

BrXii 

5/22  •  •  • 

Nl2n 

1 

Br22 

BR2n 

Bln2  •  •  • 

Blnn 

1 

Br„2  •  • 

■  BRnn 

i  Br 

-Bi 

\Bi  Br  ) 

As  shown  in  theorem  22,  the  determinant  of  this  matrix  is  found  by  the  par¬ 


titioned  matrix  determinant  lemma  45,  if  the  determinant  exists.  Thus 

=  det(i?H)det(PH  +  BiBr^Bi) 


det 


Br  —Bj 


Bj  Br 
Let  Ar  =  B^^.  Then 


/ 


BiB^^Bi  = 


0  £  Bin  ARjiBji2 

i=2  i=l 


0  53  Bjnj  E  ^RjiBli2 

\  j=2  i=l 


E  Bl\j  53  ARjiBjin 
j=2  1=1 


53  Blnj  13  ARjiBiin 

j=2  t=l 


Even  though  column  one  is  zero,  when  we  consider  det(BH  +  BiBj^Bi)  we 
observe  that  the  determinant  is  not  necessarily  zero.  Thus  we  can  claim  that 


except  for  pathological  cases,  det 


Br  -Bi 


=  |det  Bf  .  □ 


\B,  Br  ) 

Lemma  2  Let  Y  =  TL  be  a  change  of  complex  variables  where  Y  and  L  are 
in  C"’'*’  and  T  is  upper  triangular  in  €  C”^".  Then  the  Jacobians  of  the 
transformations  between  Y  and  L  are  \J{Y  — ♦  L)\  =  jdetT’p'’  =  H  \Tkk^^ 

fe=i 

and\J{L^Y)\=  ft 

k=l 

Proof.  We  begin  by  treating  the  matrices  as  the  sum  of  their  real  and 
imaginary  parts. 

YR  +  iYi  ={TR^-iTi){LR^iLi) 

Then  matching  the  real  and  imaginary  parts  we  get 

Th  =  TrLr  —  TiLi 


and 


Yi  =  TjLr  +  TrLi 


Note  that  this  is  merely  the  Cauchy-Riemann  condition  for  the  existence  of 
the  complex  derivative  at  the  points  given  by  L.  It  is  more  apparent  in  the 
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/  \ 

/ 

dYR 

det 

dLfi  dLj 

det 

dYj  dYt 

\  blr  at,  y 

V 

\ 

/ 

\ 

Tr 

-T, 

det 

Tr  J 

form:  (S)  =  (|r^)  and  [^)  =  -  (g)  .  'lb  find  the  Jacobian,  we  will 
examine 

\  I  I  / 

Tn®Ip  -Ti&/p 

Tt  0  Ip  Tr  ®  Ip 

by  lemma  49.  We  now  apply  the  partitioned  matrix  determinant  lemma  45  to 

get 

|det(7’fl)det(7’«  +  T,Tr^T,)\^  =  |[det(7’«)]-^  det(/  + 

=  |[det(TH)]2  det[/  +  {TR%f]f  =  |det(7fi)  det(/  +  H-%) 

By  proposition  65  this  is 


2p 


|det(TR  +  ir,)|"''  =  IdetTI^'’ 


since  Tr  is  conformable  with  the  matrices  I  and  T^^Ti  .  The  last  term  is  the 
magnitude  of  the  determinant  of  complex  triatjgular  matrix  T  raised  to  the 
power  2p.  Thus  the  Jacobian  of  the  change  of  variables  V  =  T L  is  given  by 
|J(K  L)\  =  Idet'fl^"  and  \J[L  V)|  =  |det7T^^  >=> 


Lemma  3  Let  V'  =  T A  be  a  change  of  complt  x  variables  bt  tween  V’  and  A 

where  Y  €  C"^^,  A  €  C”^^,  and  7'  is  lower  triangular  in  C"’"*  with  positive 

real  elements  on  the  diagonal.  Then  [./(V  /t)|  =  H  I'kk  —*  ^  )l  = 

*.=1 
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Proof.  By  lemma  2,  we  know  \J{Y  A)|  =  [detT’l^'’.  At  this  point  in 
lemma  2,  no  use  of  the  fact  that  T  is  triangular  has  been  made.  Since  T  is 

n 

lower  triangular  with  positive  real  diagonal  elements,  then  det  T  =  H  Tkk  and 

A:=l 

therefore \J{Y  ^  A)|  =  ft  T^kk  and  \J{A  --  K)!  =  ft  T^k"^.  □ 

A;=l  it=l 


Lemma  4  Let  Y  =  AT  be  a  change  of  complex  variables  between  Y  and  A 
where  Y  6  A  €  and  T  is  upper  triangular  in  with  positive 

p 

real  elements  on  the  diagonal.  Then  \  J{Y  — >  A)|  =  H  T^k  \  J{^  ^)l  = 

k=l 

fl 

k=l 

Proof.  Y  —  AT  implies  that  the  transpose  is  Y^  =  T^A^.  Note  that 
Y^  €  A^  €  and  T^  €  The  matrix  T^  is  lower  triangular. 

By  lemma  3,  |j(T^  A^)|  =  fl  T^k-  Since  the  Jacobian  determinant  is 

scalar,  it  equals  its  transpose.  Thus  >  A^)|  =  \J{Y  -+  A)| .  Therefore 

\J(Y  4)1  =  ft  TS  and \J(A  ^  K)|  =  ft  O 

k=l  fc=l 


Lemma  5  Let  Y  =  AT  be  a  change  of  complex  variables  between  Y  and  A 
where  Y  €  A  G  C"’''’,  and  T  is  loxoer  triangular  in  with  positive 

real  elements  on  the  diagonal.  Then  |J(V  — »  A)]  =  fl  T^k  ^)l  = 


n  T, 


-2n 
kk  • 


k=l 


k=l 


Proof.  Y  =  AT  implies  Y^  =  T^A^  where  Y^  G  C'”'",  A^  G  C^^",  and 
T^  G  T^  is  upper  triangular.  By  lemma  2,  |j(V'^  — >  A^)|  =  fl  T^k 

which  implies  \J{Y  — >  A)\  =  fl  ^'kk  and  \J{A  V')!  =  fl  ^kk''-  ^ 

k=l  k=l 
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Theorem  23  Let  Y  =  TAT^  be  a  change  of  complex  variables  between  Y 

and  A  where  Y  G  A  €  and  T  is  lower  triangular  in  with 

positive  real  elements  on  the  diagonal.  Let  B  =  TT^.  Then  \J{Y  — »  A)|  = 

n  =  |det  =  |det  B\'^^  =  (det  Bff>. 
k=l 

Proof.  Consider  Y  =  T AT^  as  two  transformations  Yi  =  AT^  and  Y  = 
TYi.  is  upper  triangular  with  positive  diagonal  elements.  Apply  lemma  4. 
Then 

IJ(Y,  ^  A)\  =  flT^^  =  fdelTf 

k=l 

Now  apply  lemma  3  to  obtain 

\J{r^Y)\  =  flTS=\<ielT\'‘’' 

k=l 

Thus 

\J{Y^A)\  =  \J{Y^Y,)\-\J{Y,-.A)\ 

=  Idetri'*"  =  IdetTr"!^'’  =  Idet  51^"  =  f[ 

k=\ 

The  magnitude  symbols  may  be  dropped  since  Tkk  is  real  for  all  k.  □ 

Example  3  This  quick  example  illustrates  the  minute  details  of  the  action  in 
theorem  23. 
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/  N 

/ 

I’ll 

■^iiTii 

= 

721  722 

■42i7’ii 

^  Tsi  7^32  T33  ^ 

^  -^31  Til 

A11T2X  +  A12T22 
A21T21  +  A22T22 
A31T2X  +  A32T22 


AnT^i  +  A12T32  +  A13T33 
A2iT^x  +  A22T32  +  A23T33 
A3iT^I  -i  ^32^32  +  T33A33 


( 


T„/l„r„ 


T2iAi\Ti\  +  T22A21TU 
T3\A\\T\\  +  T32A21TU  +  T33A31T11 


T\\A\\T2\  +  T11A12T22 

T21A11T21  +  T21A12T22  +  T22A21T21  +  T22A22T22 
731-^11^2*1  +  T3\Ai2T22  +  T32A21T21  +  T32A22T22  +  T33A31T2X  +  T33A32T22 


\ 

TuAiiTj^  +  TiiAi2T^2  +  TnAisTss 

T21A11T31  +  721-^127’32  +  T21A13T33  -f  722-^2i731  +  T22A22T32  +  T22A23T33 
T31A11T2X  +  T3\A\2T^2  "t"  T31A13T33  +  T32A21T21  +  T32A22T22 

+T32A23T33  +  T33A31T31  4-  T33A32T32  +  T33A33T33  ^ 

The  element  for  the  third  row  in  the  last  column  has  too  many  terms  to  fit 
onto  one  line,  and  it  is  therefore  enclosed  in  square  brackets.  Recall  that  each 
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^ij  =  ^Rij  +  iAiij.  We  get  the  following  differentials. 

dYRu  =  d/lmi  dVni  =  T^^dAm 

dYR2\  =  T\\T22dAR2i  +  •  •  •  dYi2\  =  Ti]T22dAi2\  +  •  •  • 

dYRi2  =  Ti\T22dARi2  +  •  ■  •  dYii2  =  TiiT22dAii2  +  •  •  ■ 

dYRi3  =  Ti\TyidAR\z  +  •  •  •  dYii3  =  TuTssdAns  +  ■  ■  • 

dYpzi  =  TiiT^adApzi  +  •  •  •  dV/31  =  TuT^sdAi^i  -!-••• 

dYR22  —  T22dAR22  +  •  •  •  =  T22dAl22  +  •  •  • 

dYR32  =  T22T^zdARZ2  +  •  •  ‘  dYj32  —  T22T33dAl32  +  ’  '  • 

dYR23  =  T22T33dAR23  +  •  •  •  dYl23  =  T22T33dAl23  +  ’  ’  ‘ 

dYR33  =  T^dAR33  +  ••'  dYi33  =  T^dAi33  +  •  •  • 

Therefore 

dY  =  TllTllT^dA  =  (detr)*2 
where  p  =  3.  Let  B  =  TT^.  Thus 

dy  =  (detr)‘‘P  =  (detB)2p 

Theorem  24  Lef  K  =  TAT^  be  a  change  of  complex  variables  between  Y  and 
A  where  A^  =  A  >  0  and  Y  are  both  in  and  T  G  is  lower  triangular 
with  positive  real  diagonal  elements.  Let  B  —  TT^.  Then  \J{Y  — »  i4)|  = 
(det TfP  =  (det  B)P  and  \J[A  Y)\  =  (det T)-2p  =  (detB)-P. 
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Proof.  By  corollary  36  there  exists  a  unique  p  x  p  lower  triangular  matrix 
L  with  positive  real  diagonal  elements  such  that  A  =  LL^ .  Thus 


Y  =  TLL"T^  =  CC" 


where  C  =  TL.  C  \s  lower  triangular  in  with  positive  real  diagonal 

elements  Ckk  =  TkkLkk-  By  theorem  26, 

\j(A  -  LL«)\  =  2»  n  Lfr"'*' 

k=l 

By  lemma  6, 

\j(c  -  L)i  =  n  nt' 

k=\ 

By  theorem  26, 

\j{Y  ^  CC^  =  2P 

k=\ 

Now  we  put  the  Jacobian  all  together  where  we  take  the  inverse  of  the  Jacobian 

\J{A^LL»)\. 

\J{Y  4)1  =  \j(Y  CC")|  •  \J(C  ^  i)|  •  \j{LL“  4)| 

=  n  '^kk  =  (detT)''P  =  (detrr")P  =  (detBf 

Jt=i 

Therefore 


\J{Y  ^  >1)1  =  (detT)2p  =  (detB)P 


and  therefore 


406 


\J{A  y)l  =  (detT)-^^  =  (det5)-P 

□ 

Proposition  31  Let  A  and  G  both  be  lower  triangular  complex  matrices  where 
the  elements  on  the  diagonal  are  complex.  Let  T  =  AG  define  a  change  of 
variables  between  T  and  A.  Then 

\JiT^A)\=fl\Gkk\^^^-’^^^'> 

k=i 

where  Gkk  =  Gnkk  +  iGrkki  o-nd 

\J{A  r)|  =  n 

fc=l 

This  is  the  second  equation  Khatri  section  2.5  [137],  stated  without  proof,  where 
he  uses  J{T;A)  =  \J{T  — ^  y4)|.  This  result  differs  from  Khatri’s  result. 


Proof. 


(  Tn 


T  =  AG  = 


T21 


\ 


Tn 


\ 


\ 
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(  Ou  \ 


G2I  G22 


Gpi  Gt 


p2 


a 


pp 


A.2lG\i  +  >122^21  A22G22 


\ 


I  2  ^pkGkl  13  ^pkGk2  ■  ■  '  -^ppGpp  j 

\  k=l  k=2  / 


where  Tij 


13  AjkGkj  for  i  >  i  is  a  typical  element.  We  expand  this  element. 
k=j 


Tij  =  ^  [(ARjfcGfljbj  -  AiikGikj)  +  i  {AiikGRkj  +  AnikGjkj)] 
k=j 

We  now  compute  the  Jacobian  of  this  transformation,  \J{T  — +  /4)| .  Consider 
the  Jacobian  matrix  having  the  following  rows. 


§Ieii. 

3Iau 

aiwii 

aT/tii  ,  .  . 

argil 

aigii 

9 A  mi 

9.4/121 

94/21 

^^Rpp 

/pp 

mu. 

aim. 

min. 

91m.  ... 

Huj_ 

9Ami 

dAiii 

9/4/121 

94/21 

Rpp 

34 /pp 

9Titii 

9Ttai 

9Tft21 

a^mi  .  .  . 

aZaii 

^Apii 

9.4/11 

9A/12I 

34/21 

^^Rpp 

^^/pp 

9Tl2t 

aT/21 

9Ti2i 

3Tm  . . . 

9Ami 

9>l/ii 

dAfi2\ 

94/21 

^^Rpp 

9^4 /pp 

^IPps 

aT/jpp 

9Trj,p 

9Am\ 

9.4/11 

34/121 

34/21 

^Ajipp 

5^/pp 

^^ipp 

9T/PP 

37/ pp 

dTjpp 

mj£p. 

.  ^Anii 

9.4/11 

34/121 

34/21 

^Arpp 

34 /pp 
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The  partial  derivatives  yield  a  block  structure  as  illustrated  in  the  Jacobian 
matrix  below.  The  left  half  of  the  matrix  is  given  on  top,  and  the  right  half  of 
the  matrix  is  given  on  bottom.  What  you  should  be  looking  for  is  the  pattern. 


Gru 

—Gin 

0 

Gin 

Grw 

0 

0 

0 

Grw 

—Gin 

Gr21 

; 

Gin 

Grw 

Gi21 

0 

0 

Gr22 

; 

Gi22 
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—  G/21  0 

Gr21  0 

—  G/22  0 

G[122  0 


Gru 

—Gin 

Gr2\ 

—  Gi2\ 

Gin 

Gmi 

Gi21 

Gr2\ 

0 

0 

: 

; 

Gn  0  ••• 

0  <?u  G21  0  ••• 

Q  G22  0  ■■■ 

=  det  10  Gn  G21  G31  0  ••• 

:  0  G22  G32  0  •  •  • 

:  0  G33  0  •  •  • 

:  0  ••• 

The  Jacobian  is  the  magnitude  of  the  determinant  of  this  matrix. 

Gn  ii/21 

1H21  1H22 


\J{T  >1)1  =  det 
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/  \ 

Grw  —Gill 

where  the  Jacobian  matrix  has  been  partitioned  so  that  Gn  = 

y  Gill  Gru  j 

iHi2  =  0  and  1^21  =  0  where  0  is  a  matrix  of  zeros  of  appropriate  dimensions. 
The  lower  left  subscript  on  H  identifies  the  block  number.  It  is  the  number 
of  row  blocks  that  are  excluded  from  the  full  Jacobian  matrix.  We  continue 
with  the  sequential  partitioning  scheme  to  compute  the  Jacobian  by  using  the 
partitioned  matrix  determinant  lemma  45. 


det 


Gn  1H21 
1H21  1H22 


—  det(Gii)|  •  jdet [1/1^22  ~  1H21G11  iJfi2]| 


det((jii)|  •  |det[ii?22]l  — 


det  Gii  j  |det[2i^22]l 

P 

=  IdetGii  1^22  |det[3J^22]|  =  |detGjtfc| 

K=1 


Note  that 


/ 

|detGfcfc|  = 

det 

\ 

Gnkk  —Gikk 


where  Gkk  =  Gnkk  +  iGikk-  Therefore, 


—  Gjikk  +  G]kk  —  \G, 


It  It  I 


lt=l 

By  the  inverse  property  of  Jacobians, 

\J{A-*T)\=f[\Gkk\-^^’'~'‘^'^ 

it=i 

□ 


411 


Proposition  32  Let  A  and  G  both  be  lower  triangular  complex  pxp  matrices 
with  real  diagonal  elements.  Let  T  =  AG  define  a  change  of  variables  be¬ 
tween  A  and  T.  Then  the  Jacobians  of  the  transformations  are  \J{T  —*  /4)|  = 
n  ^)i  =  n  is  the  fourth  equation  of 

fc=i  fc=i 

Khatri  section  2.5  [137],  which  is  stated  without  proof.  This  result  differs  from 
Khatri ’s. 


Proof.  Notice  that  since  A  and  G  are  both  lower  triangular  with  real 
diagonal  elements,  then  T  also  is  lower  diagonal  with  real  diagonal  elements. 
A  typical  element  of  T  is  given  by  TRjk  +  iTijk  =  ?«>  where 


Tij  =  ^  [{ARikGRkj  —  AiikGikj)  +  i{AukGRkj  +  ARikGikj)] 

k=j 


~  ^  ^  \i.ARjqG  Rqk  AfjqG  Jqk)  "h  1  {A/jqGRqk  "I"  ArjqG  Jqk  ) ,  "I"  AjjGRjk 

q=k 

For  example, 


Tu 


T2I  —  Ar2\Gi\  +  i(Al2\Gi\ )  +  A22Gr2\ 

We  separate  the  real  part  of  the  equation  from  the  imaginary  part.  This 
implies 

Tr2\  =  Ar2\G\\  +  A22Gr2\ 


7/21  —  Al2\G\\  +  A22Gi2\ 
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We  repeat  this  procedure  for  each  element  that  is  not  a  pure  real  element  or 
a  pure  imaginary  element. 

T22  =  A22G22 

Tr31  =  Ar3iGii  +  Alt32G/i21  ~  A132G121  +  A33GR31 
7/31  =  Alz\G\\  +  Al32Gfl2l  +  Ajt32Gl2\  +  A33G131 
Tr32  =  Alt32G22  +  ^33G/J32 
Ti32  —  A132G22  +  A33G132 
T33  =  V433G33 

Then  the  Jacobian  |J(T  —*  A)|  is  computed  as  the  determinant  of 
Gn 

Gii  0  Gr21 

0  Gil  G121 
G22 

G\\  0  Gr21  — G/21  Gr31 

Gu  Gi2\  Gr21  Gi3i 

G22  0  Gr32 

G22  Gi32 
G33 
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Proposition  33  Let  A  and  G  both  be  lower  triangular  complex pxp  matrices 
with  complex  diagonal  elements.  Let  T  =  AG  define  a  change  of  variables 
between  G  and  T.  Then  the  Jacobians  of  the  transformations  are  \J{T  G)|  = 
n  o.nd  \J{G  — »  r)|  =  n  .  This  is  the  first  equation  of  Khatri 

k=\  k=l 

section  2.5  [137],  which  was  stated  without  proof.  Closely  compare  this  to 
lemma  31  to  see  that  before  we  changed  variables  between  A  and  G,  the  order 
of  the  constant  and  variable  matrices  have  been  changed.  This  result  is  different 
than  Khatri’s. 

Proof.  The  matrix  is  given  by  the  following.  The  left  half  of  the  matrix 
is  in  the  top  display,  and  the  right  half  of  the  matrix  is  in  the  bottom  of  the 
displayed  pair.  You  should  be  looking  at  the  pattern  of  the  elements.  The  top 
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half  is; 


Arw 

~^/ii 

Am 

Arw 

Afi2l 

—Am 

Aii22 

—  A  122 

Am 

Ar2\ 

A 122 

Ar22 

Ami 

—Am 

Ar32 

—  A  132 

Am 

Aim 

Ai32 

Afi32 

^R22 

A 122 


Ar32 

Ai32 


—  A  122 
Aft22 

—Am 

AKi2 
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The  bottom  half  is: 


>4r33  —Ai33 

•4/33  -4/133 

Ar33  —4/33 


4/33  Aft33 


4h33 

4/33 


—  4/33 

4/m 
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This  patterned  matrix  gives  a  Jacobian  determinant 


\J{T  G)|  = 


111 


A21  A22 


A22 

A^i  Az2  -das 


A 


32 


I33 


I33 


where  Akk  =  Ankk  +  iAjkk  and 

Akk  = 


=  n  =  ii 

k=i  Jt=l 


2k 


Ankk  —Aikk  ^ 


y  Aikk  Ankk  j 

By  the  inversion  property  of  Jacobians,  we  also  have 


\J{G  -  r)|  =  n 

fc=l 


-2k 


Lemma  6  Let  A  and  G  both  be  lower  triangular  complex  p  x  p  matrices  with 
real-valued  diagonal  elements.  Let  T  =  AG  define  a  change  of  variables  be¬ 
tween  G  and  T.  Then  the  Jacobians  of  the  transformations  are  \  J[T  — »  G)|  = 
n  -dfcfc”*  and  \J{G—*T)\  =  FI  This  is  the  third  equation  of  Kha- 

fc=i  *=1 

tri  section  2.5  [137],  which  was  stated  without  proof.  This  result  differs  from 


Khatri ’s. 
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Proof.  The  Jacobian  \  J{T  — +  (j)|  is  computed  as 


■^11 

0 

0 

0 

0 

Ar21 

^22 

0 

0 

0 

■^721 

0 

^22 

0 

0 

0 

0 

0 

■^22 

0 

det 

■^7731 

^7732 

—  -^132 

0 

>433 

>^731 

^732 

■^R32 

0 

0  /I33 

0 

0 

0 

^R32 

0 

0 

0 

0 

0 

■^732 

0  :  ;  >433 

0 

• 

• 

1 

:  >433 

* 

• 

By  the  inverse  property  for  Jacobians, 


C.4  Jacobians  Requiring  Exterior  Product  Ap¬ 
proach 

It  is  true  that  only  a  few  of  the  Jacobians  to  come  will  be  derived  using  the 
exterior  product.  However,  those  to  immediately  follow  are  very  important 


418 


to  our  handling  matrix  quadratic  forms,  such  as  the  complex  Wishart  matrix. 
We  begin  slowly  with  a  important  case.  This  is  worth  following  closely  as  an 
example  of  the  power  of  using  an  exterior  product  approach  for  a  nonlinear 
change  of  variables. 


Theorem  25  Let  T  be  an  upper  triangular  complex  matrix  of  size  p  'X.  p  with 
positive  real  elements  on  the  diagonal.  Let  B  —  TT^ .  Then 

\j(B  T)i  =  rfl 

fc=l 

and 

\J(T  -  B)\  =  2-”  n  nV'*'""’ 

fc=l 

This  is  a  complexification  of  a  variation  of  Muirhead  theorem  2.1.9  (p.  60) 

im 


Proof.  This  is  a  complexification  and  slight  expansion  of  Muirhead’s  proof. 
We  begin  by  looking  at  the  matrices. 


B  =  TT 


H 
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Note  that  the  diagonal  terms  of  do  not  have  the  asterisk  since  T’*  =  Tu 
because  Ta  €  R.  Columns  of  the  matrix  TT^  are  given  below  to  make  the 
pattern  of  entries  clear.  Columns  1  and  2  are: 


+  E  TnT22  +  E 

3=2  3=3 

TiiT;,  +  t  TvT;^  Tl  +t\T2jf 

3=3  j=3 

T33T,*^  +  E  r3,r*  r33r;3  +  e 

j=A  j=4 

T^aTU  +  E  +  E 

j=5  j=5 

^(p-i).(p-i)^r,(p-i) + ^(p-i).p^r,p  ^(p-i).(p-i)^2,(p-i)  +  ^(p-i).p^2,p 

T»  rp^  ^  rpm 

PP-*  Ip  ^VV^2p 


Columns  3  and  4  are: 


T13T33  +  T14T44  +  ^  TijT^j 

j=4  j=5 

T23T33  +  ^  T2jT^j  T24T44  +  T2jT4j 

j=4  j=S 

n+E  ITa.f  TS4T44  +  E  T3jT:j 

j=4  >=5 

+  E  T4jT,-j  T^4  +  E  |T4,r 

i=5  J=5 

^(p-i).(p-i)^3,(p-i)  d"  T^p-i)^pT^p  T(p_i),(p_i)r4  +  r(p_i)_pT4  p 

rp  rp^  rp  rp^ 

pp-^  3p  PP-*  4p 

The  last  two  columns  are: 


7i,(p-i)T(p_i),(p_i)  +  Ti.pT^'p.!)  p 

TipTpp 

?2,(p-1)7’(p_i),(p_i)  +  T2,pT^p_-i)^p 

T2pTpp 

r3,(p-i)7(p_i),(p_i)  +  Ts.pT^p.jj  p 

TspTpp 

^4,(p-l)r(p-l),(p-l)  +  74,p7(p_i)  p 

T4pTpp 

^(p-i).(p-i)  |^(p-i)-p1  ^(p-i).p^pp 

an  rp2 

-‘PP-'(p-l),p  -^PP 

To  find  \dB\  we  take  the  exterior  product  of  the  differentials.  We  begin 
with  the  term  Tpp  =  Bpp  and  work  backwards  through  the  array.  This  tactic 
simplifies  the  algebra  in  the  following  way.  Once  a  term  dTij  is  computed,  it 
never  needs  to  be  computed  again.  In  forming  the  overall  exterior  product, 
repeated  differentials  cause  that  product  term  to  be  zero;  dTij  A  dTj  =  0.  Our 
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next  step  is  to  form  the  differentials  needed. 

^pp  —  dBpp 

^(p-i)>p  ~  ^(p-i)>p^pp  ~  [^fl(p-i.p)  "I"  *^/(p-i,p)]^pp 


^fl(p-i,p)  —  Tii(p^\^p)Tpp 
Bi(p-\,p)  =  T/(p_i_p)Tpp 

fiR(l,p)  =  Tr(i^p)Tpp 

—  Ti(i^p)Tpp 

B(p-i,p-i)  =  7(p_i,p_i)  +  7(p-i,p)?(p_i,p) 
■^(i.p-1)  =  ^(i,p-i)^(p-i,p-i)  +  ^ip2)p-i,p) 
■Sfl(i,p-i)  =  2H(i,p_i)r(p_i,p_i)  +  •  •  • 

■^7(1, P-1)  =  ^7(1,p-1)7(p-1,P-1)  +  •  •  • 

Baa  =  Tl  +  E  TajT^j 

j=s 

^34  =  TsaTaa  +  •  •  • 

Bn  =  T,\  +  -- 


=  2TppdTpp 

dBjnjp—\,p)  —  rppdJfl(p_i^pj  -|- 
dBi(p.i,p)  =  TppdTi(p-x^p)  +  • 

dBR(x^p)  -  TppdTR(x^p)  +  •  •  • 

dBj^i^p)  =  TppdT/ji  p)  H - 

dB(p^ip.i)  =  2r(p_i,p_i)d7(p_i,p_,)  + 

dBR(i  p.x)  =  r(p_j,p_i)d7H(i,p-i)  H - 

dBj(x^p.x)  -  r(p_i,p_i)d77(i,p_i)  H - 

dBAA  —  2T^AdTAA  -!-••• 
dBR34  =  T^AdTRSA  +  •  •  ■ 
dBl34  =  T4AdTl3A  +  •  •  • 


dBii  =  2TixdTxt  +  •  •  • 

We  examine  dBR(p^x,p)  as  an  example  of  the  reduction  in  terms  achieved  due 

to  recurrence  of  differentials  in  different  terms. 

<^^fl(p-i,p)  —  (^^7J(p-i,p))^pp  +  TR(p^x,p)dTpp 
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When  the  product  dBpp ^dBfUp^t^p)  is  formed  we  get 

~  ‘^^pp^^pp  /\[^pp^^R(p-^,p)  d"  ^R(p-i,p)^^pp\ 

—  2T'ppd7’pp  f\ dTn^p^i^p^  +  27’pp7fl(p_i  pj  dTpp  f\ dTpp=  2TppdTpp  /\  dT^p^i  p^ 

0 

Therefore,  to  simplify  algebra,  we  need  only  to  keep  track  of  dBij  terms  which 
have  not  already  appeared  in  our  sequence  of  computations.  Also  note  that 
if  both  terms  in  a  product  have  appeared  in  an  earlier  computation,  they  do 
not  need  to  be  considered  again.  This  is  because  their  differentials  will  be 
multiplied  by  the  same  differentials  from  earlier  computations,  yielding  zero. 
This  is  first  seen  in  our  computation  of  dJ9R(i,p_i). 

5h(1,P-1)  =  7h(1,p-1)7(p-1,p-1)  +  TR(,,p)rR(p_,,p)  -  r/(i,p)T/(p_i,p) 

d3fl(i_p)  first  appears  in  dBn^x^p) 
dTfl(p_i,p)  first  appears  in  dBR^p_i^p) 
dT/(i^p)  first  appears  in  dBj(\^p) 
dr/(p_i,p)  first  appears  in  dB/(p_i,p) 

Note  that  for  an  Hermitian  matrix  B  =  5^,  we  need  only  to  look  at  the 
superdiagonal  elements.  This  is  because  in  doing  so,  we  generate  terms  involv¬ 
ing  the  differentials  of  all  real  and  all  imaginary  components.  For  example, 
consider  B(p_i,p). 

^(p-l,p)  ~  ^pp^(p-J,p)  ~  ^Ppl^H(p-l,p)  ~  *^/(p-l,p)] 
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^R(p-l,p)  —  '^ppTR(p-t,p)  ^^R(p-l,p)  —  TppdTfUp^ipi  +  •  •  • 

^I{p-l,p)  —  ~^pp^/(p-l,p)  ^^l(p-l,p)  —  ~^pp^^/(p-l,p)  +  •  ■  ■ 

Recall  that  we  already  have 

=  TppdT[Hp_ipj  +  •  •  • 

d5/(p_i^p)  =  Z^pC?!7/(p_i,p)  "!•••■ 

Thus,  when  the  exterior  product  is  taken,  terms  containing  both  dBij  and  dB*j 
will  be  zero. 

p 

Observe  that  each  term  contains  a  factor  2.  Thus  A  dBkk  =  ^^TkkdTkk- 

fc=i 

The  off-diagonal  terms  need  a  bit  more  care  because  the  differentials  for  both 
the  real  and  imaginary  parts  must  be  considered.  Note  that  each  differential  in 
a  given  column  k  has  the  factor  Tkk-  We  therefore  can  think  of  taking  the  prod¬ 
uct  of  all  the  terms  in  the  matrix  where  T^k  is  the  result  of  dBfi(j^k)  f\dBi(j^k)‘ 

^2r„  ••• 

2Tr,  •••  T^p 

^  2Tpp  ^ 

Thus  \J{B  T)\  =  2P  n 

fe=:l 

Theorem  26  Let  T  be  a  lower  triangular  complex  matrix  of  size  p  x  p  with 
positive  real  elements  on  the  diagonal.  Let  A  =  TT^ .  Then  \J{A  —*  T)\  = 
2^  n  and  \J{T  A)\  =  2-p  {\  This  is  a  complexifica- 

k=l  fc=l 

tion  of  Deemer  and  Olkin  theorem  f.l  [67].  This  is  used  by  Khatri  [137]  in  his 
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proof  of  theorem  2.8  just  before  equation  2.8.1.  This  is  Srivastava  and  Khatri 
problem  1.29  [257]. 

Proof.  This  proof  follows  the  model  set  for  theorem  25  by  Muirhead  rather 
than  complexifying  Deemer  and  Olkin’s  proof. 
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The  columns  of  TT^  are  given  below.  The  first  three  columns  are: 

TxxT;!  rur3*, 

TiiTn  T2iT*,  +  T^^  T2iT*,  +  T22T^^ 

Ti  rp  rp  rpm  t  HT  T*  T*  'V*  _L  T*  'V*  _L  T’2 

31-^11  -'31-^21  +  J32J22  ^31^31  -^32-432  -^33 

Tp-i,iTn  7’p_i,iT2*i  +  7p-i,2T22  Tp-i^iT^^  +  Tp_i, 2^3*2  +  Tp^i^sTas 

TpiTii  Tp,T^,  +  rp2r22  Tp.xr3*,  +  Tp,2T^^  +  rp,3r33 

The  next  to  the  last  column  (p-1)  is: 


T„r. 


T2\T*_^^  +  T22r*_i_2 


^3iTp_i  1  +  T32T*_i2  +  ^ssT’p-i.a 


T'  T**  I_  T*  'T*  _L  7^2 

p-l,lip_14  “T  Jp-1.2Jp_l,2  -'p-1, p-1 

^P,l^p-l,l  +  ^P.2^P-1,2  +  ■  ■  ■  +  Tp,p_i7’p_l,p_i 
The  final  column  (p)  is: 


r2ir;,  +  r22r;2 

Ti  rpit  1  'T*  'T**  _L  7^  T** 

31^pl  +  -'32-'p2  ■*"  -'33ip3 

Tp-i.iTpj  +  rp_i,2Tp2  H - h  rp_i,p_irpp_j 

TpiTpi  4-  Tp2Tp2  H - 1-  T^p 
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From  this  matrix  we  compute  our  differentials. 


A.,  =  r?, 


dA\\  —  2T\\dT\i 


•^21  =  T2\T\\ 

Ar21  =  Tfi2iTi\ 
■4/21  =  Tl2\Ti\ 


dAn2i  —  T\\dTii2i  +  •  •  • 
dAi2i  =  Ti\dTj2i  +  •  •  • 


■4/ipi  =  TftpiTii  dAiipx  =  TiidTfipi  +  •  •  • 

4/pi  =  TipiTn  dAipi  —  TiidTjpi  +  •  •  • 

A22  =  T2i[Tr21  ~  iTl2l]  +  T22  dA22  =  2T22dT22  +  *  •  • 


Observing  the  pattern  and  recalling  the  patterns  generated  for  the  upper  tri¬ 
angular  case,  we  see  that  for  the  lower  triangular  case  that  we  get 

|j(4  ^  r)i  =  2p  n 

«r=l 

\J{T  4)1  =  2-P  n 

k=l 

□ 


Theorem  27  Let  T  be  an  upper  triangular  complex  matrix  of  size  p  x  p  with 
positive  real  elements  on  the  diagonal.  Let  A  =  T^T.  Then 

\j{A  ^  7)1  =  2"  n 

k=i 


17(7^  4)1  =  2-" 

k=l 


and 
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This  is  a  complexification  of  Muirhead’s  theorem  2.1.9  [187].  This  is  also 
Goodman  equation  5.25  [92]. 


Proof.  This  is  a  complexification  of  Muirhead’s 

proof. 

/ 

■^11 

Ai2  ■  •  • 

\ 

A\p 

A  =  T^T  = 

Ai2 

A22  •  •  • 

A^p 

; 

: 

/4* 

••• 

^pp  j 

( 

\ 

( 

Tn 

r„ 

T\2  •  •  •  Tip 

= 

T[2  T22 

T22  •••  T2p 

\  ^ip  ^2p  •  • 

T 

^pp  J 

\ 

'^PP  y 

/ 

T\\Ti2 

•  •  • 

TiiTip 

T[2Tn 

T^fTu  +  T22 

^u^ip  +  T22T2P 

I  'T’*  'T  T**  'V  _L  T**  T'  T*  "T  _L  'V*  T*  _i_  T’*  'V  -L  7^2  ) 

y  ■'ip-'ll  -*lpJl2  +  i2p-'22  Jlp^lp -T  ^2p'*2p  +  J2PJ2P  i  ipp  j 

Note  the  similarity  of  this  with  the  A  of  lemma  26.  Instead  of  forming  the 
exterior  product  from  the  lower  triangular  terms  of  A,  use  the  upper  triangle 
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of  A. 

/!..  =  rft 
^12  =  T\\Ti2 
Ab.\2  =  T\\Tr\2 
Am  =  TnTm 


dAi\  —  ‘lT\\dT\\ 

dAiii2  =  T\\dTR\2  +  •  •  • 
dAm  =  T\\dTm  +  •  •  • 


Arip  =  TuTrip 

A/pi  =  TuTiip 

A22  =  T^2{Tri2  +  iTiu)  +  T22 


dARip  —  T\\dTRip  +  •  •  • 

dAjip  =  TndTjip  +  •  •  • 

dA22  —  2T22<f722  -!-••• 


Thus 


\J(A  -  T)|  =  2>'  ft 

fc=l 

\j{T  A)\ = 2-”  n  n;"'”-*'-' 


k=l 


Theorem  28  Let  Y  €  Mp{C)  and  A  =  Y^Y.  Then 


\J{Y  A)\  = 


CTpip) 


and 


\J{A 


y)\  = 


CTpjp) 

TTF^ 


Proof.  This  lemma  depends  on  integrals  that  are  proven  under  the  section 
of  helpful  integrals.  From  theorem  150  with  E  =  /  we  have 


f  etr(-yl)(det  AY'^idA)  =  Crp(a) 
Ja>o 


From  proposition  105, 
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/  etr(-K"y)det(y"K)“-P(t/V')  = 

■/Mp(C)  ^  >  K  )  y  ! 


Let  A  =  Y^Y.  Then 


crp(p) 

TCP^ 


L 


Mp(C) 


etr(-K"y)(det(F"y)“-P(dy)  =  crp(a) 


Therefore  \J{A  y)|  = 


Theorem  29  If  the  density  ofY  ^  Qpxm  a  function  ofYY^,  f{YY^){dY). 

then  the  density  of  B  =  YY^  is  given  by 


g{B) 


\detBrf{B) 


7r-pmcrp(m) 

The  Jacobian  of  the  transformation  is 


{dB) 


\J{Y 


5)1  = 


ttP”*  |det 

Crp(m) 


This  is  theorem  1  of  [256]. 


Proof.  This  is  proven  in  theorem  67  which  is  a  replication  of  Srivastava’s 
derivation  of  the  complex  Wishart  density.  This  theorem  is  stated  here  to  keep 
it  in  context  with  similar  theorems.  This  is  a  complexification  of  Anderson 
lemma  13.3.1  [26].  Crp(m)  is  defined  in  the  section  on  helpful  integrals. 

Theorem  30  Let  x  =  By  be  a  linear  change  of  rmriables  from  complex  vector 
X  G  C"  to  complex  vector  y  where  B  is  in  C”*"'  .  Let  Re{B)  >  0.  I'hen 

1,7(3.  ^  y)\  =  jdet  771^ 
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Proof.  I  do  not  have  a  record  of  the  pedigree  of  this  result  or  its  proof.  I 
presume  that  is  known  already  to  many  people. 

X  =  By  =  {xfi  +  ixj)  =  {Bn  +  iBi){yR  +  iyi) 


=  {Bnyn  —  Biyi)  +  i{Bjyn  +  Bnyi) 
Thus,  this  is  a  change  of  variables  of  the  form 


(  \ 

/  r.  r.  \ 

(  \ 

xr 

Br  —Bi 

VR 

K  J 

y  Bi  Br  ^ 

,  yi  j 

=  |det(i9/i)det(j9H  +  BiBnBi)\ 


The  Jacobian  is 

Bn  —Bi 

\J{x  — >  i/)|  =  det 

^  Bi  Bn  I 

=  \[det{Bn)?det{I  +  B^^BiBn^Bi)\  =  \[dei{ Bn)?  det[I  +  [Br^ Bif]\ 
=  \[det{Bn)?det[{I  +  iB^^Bi){I-iBn^Bi)]\  =  |[det(Bfi)]2|.|det(/ + 

=  |det(5fl)  det(/  +  iBn^Bi)f  =  \det{Bn  +  iBj)?  =  |det  B? 


By  the  inverse  property,  \J{y  — ♦  a:)|  =  |detB|  when  it  exists.  When  B  — 
B^  then  JdetBl  =  det  B  because  the  eigenvalues  of  B  are  real  and  detB  = 

n 

n  ,  by  Goodman  corollary  2.1  proof  [92].  □ 

t=i 

Theorem  31  Let  X  =  BY  be  a  complex  linear  transformation  between  the 
variables  X  E  Y  E  and  B  E  Then  |J(A'^  — +  F)|  = 

|detBp”*  and  |J(K  —*  X)|  =  IdetBj”^”*.  This  is  a  complexification  of  Muir- 
head  [187]  theorem  2.1.4- 
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Proof.  Muirhead  provides  a  proof  for  the  real  case  which  uses  exterior 
products.  I  have  provided  a  more  traditional  proof.  We  know  that 

\JiX  Y)\  =  I J(Xi  Yx)J{X2  ^  Fa)  •  •  •  J{Xm  Ym)\ 

By  theorem  30,  \J{Xk  — >  Yk)\  =  |det  .  Thus  \  J{X  Y)\  =  \dei  and 

\J{Y^X)\  =  IdetBl"^”*.  □ 

Theorem  32  Let  Z  €  C"^*"  where  the  rank  of  Z  is  m.  Let  Z  =  H\T  where 
Hi  =  Im  o-nd  T  is  an  my.  m  upper  triangular  matrix  with  positive  diagonal 
elements.  Let  H2  (a  function  of  Hi)  be  an  n  y  {n  —  m)  matrix  such  that 
H  =  [Hi,H2]  is  an  Hermitian  n  x  n  matrix.  In  column  vector  notation,  let 

H  —  [/li ,  •  •  •  ,  hjn  1  ^m+1  >  ■  "  "  5  ^n] 

where  belong  to  Hi.  Then 

’  m 

(dZ)=  nT’f-"*-  {dTH,)(H»dHi) 

.i=l 

where 

m  n 

{H»dHi)=  f\  A  {h'^dhi) 

i=l  j=i+l 

and 


and  /\  here  is  an  exterior  product  operator.  This  is  a  complexification  of  Muir- 
head’s  theorem  2.1.13  [187]. 
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Proof.  The  following  is  a  complexification  and  expansion  of  Muirhead’s 
proof.  First,  recall  by  C.  R.  Rao  lb.2(ix)  [213]  that  Z  can  indeed  be  decom¬ 
posed  into  H\T  where  H\  =  Imi  and  T  is  upper  triangular  with  positive 
real  diagonal  elements.  Also,  given  a  subunitary  matrix  such  as  Hi,  we  can 
always  find  a  completion  to  Hi  to  form  unitary  matrix  H. 

Our  goal  is  to  find  (dZ)  in  terms  of  (dT)  and  (dHi).  What  we  will  actually 
get  is  something  almost  what  we  want,  and  this  thing  we  get  turns  out  to  meet 
our  needs.  We  start  by  invoking  theorem  21,  Z  =  HiT  implies  that 

dZ  =  {dHi)T HidT 


Consider  H^dZ,  which  involves  our  completed  unitary  matrix  H. 


H”dZ  = 

H« 

dZ  = 

H^^idHi)T  +  H^HidT 

H(^{dHi)T  +  dT 

H”{dHi)T  +  H^HidT 

H»dHiT 

Note  that  dHi  here  is  a  matrix,  not  an  exterior  product.  Also  note  by  H  being 
unitary  that  H^ Hi  =  /„,  and  H2  Hi  =  0,  the  zero  matrix. 


By  theorem  31,  the  exterior  product  of  H^dZ  is 

(H^dZ)  =  |deti/"|^’"  (dZ)  =  (dZ) 

So,  if  we  find  {H^dZ),  we  have  also  then  found  (dZ).  Now,  let  us  evaluate 
{H^dZ).  To  do  this  with  the  least  effort,  we  will  make  use  of  the  partition  and 
a  special  property  of  the  upper  partition.  The  lower  partition  is  simplest,  so 
we  begin  there. 

H2  {dHi)T  =  {hm+i,hm+2-,  ■  •  ■  1  hn)^  {dhi,dh2,  •  •  • , dh^)  T 
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-  A« 


m+l^hm 


//A 


hUdhi  h^dhi  •••  hUdh 


X  r„ 


m  / 

'  (n— m)xm 


Recalling  that  for  any  matrix  A,  det  A  =  det  and  applying  theorem  30  to 
row  j  of  H2{dHi)T  we  obtain  [detTp  A  hfdhi.  Thus  the  exterior  product  of 

t=i 

all  elements  in  //^(d//i)T  is  given  by 


A 


j=m+l 


m 

IdetTI^  /\  hfdhi 

t=l 


n  m 

=  Idetri^^"-"*)  A 

j=Tn+l  t=l 


This  also  follows  from  lemma  4. 

Now,  consider  the  upper  partition,  Hf{dH\)T  +  dT.  T  is  upper  triangular, 
and  so  is  dT.  Thus,  the  lower  triangle  (below  the  diagonal)  consists  only  of  the 
elements  of  Hf{dHi)T.  Recall  that  Hf  Hi  =  7^.  Thus 


d{HfHi)  =  d{Im)  =  0  =  [dHf]Hi  +  77"[d77i] 


which  is  the  zero  matrix.  This  means  that 


HfdHi  =  -[dHf]Hi  =  -[/7"d7/i]" 


Therefore  is  skew-Hermitian. 

\ 

ilm{h^dhi)  —[h^dhi]^  ■■■  —[h^dhi]^ 

h^dhx  ilm(h2dh2)  •••  — 

h^dhi  h»dh2  •••  -[h”dh^]» 

h»dhx  h»dh2  •••  i\m{hldh^)  ^ 

/  mxm 

Note  that  for  the  case  of  real  variables,  the  main  diagonal  consists  of  all  zeros. 
Now  evaluate  {dHi)T ,  recalling  that  T  is  upper  triangular.  This  matrix  is 
given  below  by  columns.  The  first  two  columns  are: 

i \m{h^dhx)Txx  i lm{h»dhx)Tx2  -  [h^dhx]^T22 
{h^dhx)Txx  {h^dhx)Tx2  +  i  lm{k^dh2)T22 
{h»dhx)Txx  {h^dhx)Tx2  +  {h»dh2)T22 

{h»dhx)Txx  {h’^dhx)Tx2  +  {h»dh2)T22 

The  last  column  is: 

i  \m{h^dhx)Tx^  -  £  [hUhx]^Tk„. 

k=2 

m 

{h^dhx)Txm  +  i  lm{h^dh2)T2m  -  E  [hl!dh2fnm 

fe=3 

3  fn 

E  {h^dhk)nm  +  i  lm{h»dh3)T3m  -  E 

fc=l  k=4 

E  {h^dhk)Tkm  +  i  lm{hndhm)Tmm 

it=l 

In  forming  exterior  products,  once  a  term  [h^dhi)  has  been  included,  re¬ 


peated  terms  with  the  same  index  will  cause  that  particular  product  to  be 
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zero,  {hf  dhi) /\{hf  dhi)  =  0.  Also  notice  that  dZ  f\{dZ)*  =  0.  The  exterior 
product  of  elements  helowihe  main  diagonal  of  H^{dH\)T  is  also  the  exterior 
product  of  elements  below  the  main  diagonal  of  H^{dHi)T  +  dT.  This  is 


rr'i 


•11 


h(h«dh,) 


rpm-2 

-‘22 


f\(h«dh.) 

>=3 


Tm-l,m-l  /\{hmdhm-l)\ 


m— 1  m 


=  r,7-‘T5-" A  A  ('‘fdh,) 

,=l  i=.+i 

Elements  along  the  main  diagonal  of  Hi{dHi)T  +  dT  have  an  exterior  product 
that  is  a  bit  more  tedious.  Since  a  typical  element  looks  like 


i  lm{hf  dhj)Tjj  +  dTjj  =  {h^dhj)Tjj  +  dTjj 


the  exterior  product  is 

A  [{hVdhiWii  +  dTj,] 

i=i 

m 

When  this  is  expanded,  it  has  one  term  that  is  A  dTjj,  and  another  of  the  form 

i=i 

(  n  Tjj  j  A  {hfdhj).  We  do  not  get  the  simple  form  achieved  when  hjdhj  =  0 

j=i  J  j=i 

in  the  case  of  real  variables. 


Elements  above  the  main  diagonal  have  an  inner  product  of  the  form 

m  m— 1  m 

A  dtij  =  A  A  dtij.  Putting  this  all  together,  we  obtain 

t<j  i=l  j=i+l 


|detri2("— )  /{(hfdhi) 

i=m+l  1=1 


m  m— 1  m 

n  rrr*  A  A  (/‘"dk.) 

:=1 


k=l 


I  A  [(>‘"dhj 


m— 1  m 

)T,j  +  dT„j}  A  A  dr,i 

i=i  j=i+i 


L/t=i 


nrA"-"-‘||A  A  (*?<"><) 


1=1 i=i+l 


r 

/\  +  dTj 


j=i 


I  m  — 1  m 

jj]  f  A  A 

J  1=1  >=,>1 
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where  we  ignore  sign  changes,  and  we  recall  Tkk  >  0, 


where 


=  (dr„,){H«dHCl 


(dTH,)  = 


{m  ^  m— 1 

A  +  dTi,\  A  A  iTii 

J=1  J  .=  1  i=i+l 


{H^dHr)  =  A  A  {/»? ^^0 

i=i  j=.+i 


Theorem  33  Let  Z  6  C"^”*  where  the  rank  of  Z  is  m.  Let  Z  =  HiT  where 
Hi  =  /m  and  T  is  an  mxm  upper  triangular  matrix  with  positive  diagonal 
elements.  Let  H2  (a  function  of  Hi)  be  an  n  x  (n  —  m)  matrix  such  that 
H  =  [//i,^2]  is  an  Hermitian  n  xn  matrix.  In  column  vector  notation,  let 


H  -  [^1?  h2,  ■  *  ■  1  hm,  ^771+ 1  7  ’  "  *  7  ^7l] 


where  belong  to  Hi.  Let  A  =  Z^Z.  Then 


(dZ)  =  (det  >i)(2n-3m)/2 


)Tji  +  dT„]  i  (dA,,)(H«dH,)  n  n. 
I  k=l 


where  {dAif)  is  the  exterior  product  of  elements  of  A  below  the  main  diagonal. 
This  is  a  complexification  of  Muirhead  theorem  2.1. If. 


Proof.  This  is  a  complexification  and  expansion  of  Muirhead’s  proof.  From 


theorem  32, 


771  I  771 

{dz)=  lA 


dH,)T„  A  dT„\\  A  A  ■iTii  {H«dH,) 

j=l  )  1=1  j=i+l 


i 


Also, 
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A  =  Z"Z  =  (ffiT)"(IfiT)  =  T^H^HiT  =  T^I^T  =  T^T 


From  theorem  27, 

m 

[dA)  =  2"*  n  T^t~'‘^'*'\dT) 

k=l 

Note  that  A  =  A^ .  Thus  the  exterior  product  of  elements  of  A  consist  of 
the  exterior  product  of  the  lower  triangular  submatrix  of  A.  Partitioning  that 
exterior  product  into  diagonal  and  below  diagonal  elements,  we  get  (dA)  = 
(dAD)(dAi) 


(2”nr«) 

/m— 1  m  \ 

A  A  dxJ 

\  ,=i  j=i+i  / 

L  (<IAd)  J 

L  (dAi)  J 

We  will  substitute  {dAp)  into  (dZ).  We  get  {dZ)  = 


rp2n—3m+k 

U=1 


n  nr-‘> 

Lfe=i 


/\dTiA{H^dHr) 

K'd  / 


nrp2n—3m+k 
^kk 


Lfc=l 


=  (det  yl)(2"-3’")/2 


nr4 

.fc=l 


^dTi,]YdAL){H”dH^) 

)Tii  +  dAyll  (dAi,){H''dH,) 


We  do  not  get  the  nice  form  given  in  Muirhead  [187]  for  the  case  of  real 
variables  because  the  diagonal  elements  of  H^{dH\)  are  not  zero.  Elements 
of  h^dhi  are  purely  imaginary.  H^{dH\)  is  skew-Hermitian.  □ 
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Theorem  34  Let  X  =  BYC  be  a  complex  change  of  variables  where  X  € 
cnxm^  Y  e  C"'"'”,  B  e  C"""",  and  C  e  Then 

\J{X  Y)\  =  Idet^l^”*  IdetCl^" 

This  is  a  complexification  of  Muirhead  theorem  2.1.5,  Deemer  and  Olkin  [67] 
theorem  3.6,  and  Arnold  [31]  Theorem  A.  16.  This  is  also  Khatri’s  theorem  2.3 
[137]. 

Proof.  This  is  a  complexification  of  Deemer  and  Olkin’s  proof.  Let  Z  =  BY 
and  X  =  ZC.  Then  \  J{Z  ^  K)|  =  Idetfif”*  and  \J{X  Z)\  =  jdetCp”  by 
theorem  31.  Then 


\J{X^Y)\  =  \J{Z^Y)\^\J{X-.Z)\ 


This  implies 


\J{X  ^  Y)\  =  Idet^f”*  IdetCl"*” 


and 


\J{Y  X)|  =  Idet^r^"*  |det  Cr"" 


/  \ 

(  \ 

Ir  0 

Ir  0 

Theorem  35  Let  X  = 

^0  0^ 

Y 

.0  o> 

be  a  transformation  of  com¬ 


plex  variables  between  X  and  Y.  Let  X,  K  6  and  let 


^  Ir 

V"  «/ 


be  the 
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identity  matrix  of  rank  r  embedded  in  a  null  matrix  of  size  k  x  n  where  r  <  k, 
r<n.  Then  \J{X  K)1  =  0. 


Proof.  Let  X  equal 


( 

\ 

( 

\ 

/ 

Ir 

Orx(n-r) 

Kxr 

K-x(n-r) 

Ir 

flrx(/t-r) 

^  0(fc-r)xr 

0(fc-r)x(n-r)  y 

^  Y(n-T)xr 

^(n-r)x(n-r)  ^ 

^  0(n-r)xr 

0(n-r)x(A:-r)  y 

Kxr  0 

0  0 


In  forming  the  exterior  product,  note  that  =  OdYnn-  The  zero  factor 

causes  the  entire  product  to  go  to  zero.  Thus  \J(X  — f  F)|  =  0.  Alternatively, 


if 


Ir  0 


V 


is  n  X  n  you  could  apply  theorem  34.  If  =  n  =  r,  then 


0  0 

|j(x^r)l  =  i.  □ 


/ 


Lemma  7  Let  X  =  BYB^  be  a  transformation  of  complex  variables  between 
X  and  Y.  Let  Y  =  Y^  and  both  X  and  Y  in  C”’'”.  Let  B  G  C”’'"  be  an  elemen¬ 
tary  transformation  matrix  B  =  diag(l, •  •  • ,  1, a,  1, •  •  • ,  1),  with  the  element  a 
in  the  position.  Then  \J{X  — >  V'')!  =  |«p"  and  \J{Y  — >  A')!  =  .  This 

theorem  was  motivated  by  Deemer  and  Olkin's  proof  [67]  of  their  theorem  3.7, 
and  by  Stewart  p.  43,  Example  4-26.1  [259]  which  addresses  operator  matrices 
for  a  real  matrix. 
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Proof. 


=  BYB^  =  EiYE^  = 
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In  this  matrix,  there  is  one  element  (Yu)  with  a  multiplier  of  |a|^ .  There  are 
(i  —  1)  elements  in  row  i  multiplied  by  a  as  coefficients  of  Yik  in  the  lower 
triangle  to  the  left  of  the  main  diagonal.  There  are  {n~i)  elements  in  column 
i  with  a*  as  coefficients  of  (F*,)  below  the  main  diagonal. 

Use  wedge  products  to  compute  the  Jacobian  of  the  transformation.  To 
visualize  the  problem,  first  find  the  linear  transformation  for  each  element. 

Xik  =  Xfiik  +  iXlik  =  aYik  —  {qr  +  iaj){YRik  +  iYuk) 
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Thus 


dXuik  l\dXiik  =  {aiidYfiik  -  '’idYuk)  f\{aidYRik  +  andYuk) 

—  anaidYink  f\ dYmk  +  a\dYRik  f\ dYuk  —  ojdYjik  /\  dYRik  —  ajaRdYuk  f\ dYuk 
=  {a\  +  a])dYRik  f\ dYuk  =  |a|^  dYRik  /\  dYjik 
since  by  properties  of  the  exterior  product  we  know  that  dZ l\dZ  =  {S  and 

dX  l\dY  ^ -dY  J\dX 

Observe  that 

^ik  =  -  iXlik  =  (or  -  iai){YRik  -  lYuk) 

=  {dRYRik  —  diYiik)  —  i{aiYRik  +  ORYiik) 

Thus 

dXRik  =  ORdRik  —  ajdyjik 
dXiik  =  —ajdYRik  —  aRdYi,k 

When  we  take  the  wedge  product  dXik  A  dX^i^  we  observe  that  it  goes  to  zero 
because  we  have  repeated  indices  in  our  wedge  product  of 


(dYRik  A  dYiik)  A  (dYRik  A  dYiik) 
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Therefore  we  need  only  to  consider  the  lower  triangular  and  diagonal  elements 
in  evaluating  the  Jacobian  of  this  transformation. 

Xji  =  a*Yji  =  Xiiji  +  iXjji  =  (flR  +  iAi)*{YRji  +  iYiji) 

~  (^R  ^o,i)i^Yf}ji  -j-  lYjji) 

—  i^nYfiji  "I"  O'lYiji)  "I"  i{o.f{Yij{  oiYfiji) 

dXfiji  =  andYuji  +  ajdYjji 
dXiji  =  aftdYjji  —  ajdYfiji 

dXfiji  J^dXjji  —  {^o,fidYftji  "b  o,jdYiji}  /\{^RdY]ji  ciidYnji) 

=  ajdYfiji  f\dYiji-a]dYiji  /\dYRji  =  {a\-^a])dYRj,  l\dYij,  =  \a\^  dYR^,  f\dYiji 

X.,  =  |a|2  y-;.,  =  (Xr.,  +  iX,u)  =  lal'  (>«..  +  iYiu)  =  kl'  Vr.. 
dXR,i  =  ki^  dy  R„ 

y^,  G  R  because  Y  =  V’^.  Therefore  d\„  =  dXn,,.  Finally, 

dXRjk  /\  d.\ ijk  =  d\Rjk  f\  dY I jk 

Thus 

(dX )  =  dXR„  A '/-Vh2,  a  •  ■  •  a ( kl' )""  k'l'  =  k'l'"  (^>' ) 

I  bis  leads  to  oui  conclusions.  |./(.\  — »  1  )[  =  ink"  and  j./(^  — *  .\  )1  =  ki  • 

□ 
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Lemma  8  Let  X  =  BY be  a  transfoi-mation  of  complex  variables  between 
X  and  Y.  Let  Y  =  -Y^  and  both  X  and  Y  be  in  Let  B  €  be 

an  elementary  transformation  matrix  B  =  diag(l,  •  •  • ,  1,  a,  1,  •  ■  • ,  1),  with  the 
element  a  eC  in  the  position.  Then  \J{X  — >  F)!  =  |ap"  and  jJ(y  — >  A  )|  = 

Proof.  Follow  the  proof  of  lemma  7,  taking  into  account  that  now  Y  — 
—Y^  instead  of  F  =  Y^ . 


X  =  BYB^  =  EiYE^  = 


In  this  case,  we  still  have  {i  —  1)  elements  in  row  i  below  the  main  diagonal 
that  are  multiplied  by  a.  We  have  (n  -  i)  elements  in  ''.olumn  i  below  the  main 
diagonal  that  are  multiplied  by  a*.  We  have  one  element  on  the  main  diagonal 
at  position  (i,0  that  is  multiplied  by  From  lemma  7  we  note  that 


dXHjif\dXij,  =  \(i\^  dYnj,  /\dYij 
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the  difference  in  cases  is  in  the  treatment  of  the  diagonal.  Yu  is  purely  imagi¬ 
nary  because  —Y  =  Y^.  Thus  Yu  G  C\R.  So 

=  la|'  y,,  =  {Xru  +  iXiu)  =  |a|'  {Yr,,  -r  Ti  y..)  =  i  jal"  Ym 

Thus 

dXiu  =  \a\UYiu=dXu 
since  Yru  =  0.  For  unaffected  elements, 


dXRjk  /\  dXijk  =  dYRjk  /\  dYijk 


Thus 

(dX)  =  dXm^dX;\^■■;\dXInn  =  \afidY)  =  |a|^"(dy) 

Therefore 

\J{X  ^  y)|  =  \af^  and  \J(Y  ^  X)\  =  |a|-'‘" 

□ 

Proposition  34  Let  X  =  BYB^  be  a  transformation  of  complex  variables 
between  X  and  Y.  Let  Y  =  diag(yi,  •  •  • ,  fn)  where  Yi  G  R-  Let  X  G  Let 

B  G  C"’'"  be  an  elementary  transformation  matrix  B  =  diag(  1 ,  •  •  • ,  1 ,  a,  1 ,  •  •  • ,  1 ), 
with  the  element  a  in  the  position  and  a  G  C.  Then  |y(.V  — >  y'’)|  =  |ap 
and \JiY  X)|  =  \a\-\ 
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Proof. 


Thus 

{dX)  =  dK,  A  •  •  •  A  A  l"l'  A  A  ■  •  •  A  dYn  =  lal"'  {dY) 


So  we  see  that  \J(X  — >  T)|  =  (a(*  and  \J{Y  — ►  .V)|  =  |a|  ^  □ 

Lemma  9  Let  X  =  BYB^  be  a  transformation  of  complex  variables  between 
X  and  Y.  Let  Y  =  T"  and  both  X  and  Y  in  C"’'".  Let  B  G  C”’'”  be  an 


elementary  transformation  matrix 


The  matrix  B  is  shown  here  with  the  constant  a  in  column  i  and  row  j.  For 
some  conformable  matrix  Z,  BZ  has  the  effect  of  multiplying  row  i  of  matrix 
Z  by  the  constant  a,  and  adding  that  result  to  row  j.  Then 

\j{x  Y)\  =  1  =  |j{y  ^  x)\ 

This  theorem  was  motivated  by  Decmer  and  Olkin's  proof  [61  J  of  their  theorem 
3.7,  and  by  Stewart  p.  43,  Example  4-26.3  [259]  which  addresses  elementary 
operator  matrices  for  a  real  matrix. 
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Proof.  X  =  BYB»  =  E^YE"  = 


^21  • 

••• 

^21 

X22  • 

••  xt. 

•••  ^^2 

^.1 

X.2  • 

•  X.. 

••• 

Xn, 

Xn2  • 

••  ^n. 

••• 

y* 

+  Q*Yii 

xn 

...  Y 

ijj 

+  oVj*  +  a*Yji  +  1 

+ an* 


\  Ki  Kz  •••  Ki  +  a-K, 

dXki  =  dXRki  +  idXjki 

Look  separately  at  the  real  terms  and  the  imaginary  terms. 

dXfiki  =  dYfiki 

■  I  <  k,k^  j 

dXiki  =  dYiki 

/ 

Form  the  wedge  product  of  the  real  terms  and  the  imaginary  terms. 


dXfiki  A  =  dYfiki  A  ^yiM 

We  get  the  same  results  when  we  examine  except  for  a  sign  change.  Thus 
dXki  f\dXli  =  0.  We  therefore  can  restrict  attention  to  the  lower  triangle  plus 


449 


diagonal  portion  of  X. 

dXji  =  dXjiji  +  idXjji  =  dYji  +  adYu 
=  {dYRji  +  idYiji)  +  (a/t  +  ta/)(dYRii  +  dY/n) 

=  dYfiji  +  idYiji  +  CRdYm  —  aidYm  +  iaidYRn  +  ioRdYm 
dXRji  =  dYRji  +  andYRii  —  aidYm 
dXiji  =  dYijt  +  aidYRii  -I-  ORdYm 

Note  that 

dXRii  A  dXm  A  =  dYRii  A  dYm  A  ^Yrj, 

since  the  other  terms  of  dXRji  of  the  exterior  product  go  to  zero.  Similarly, 

dXRu  A  dXm  A  dXRji  f\ dXjji  =  dYRu  f\ dYm  A  ^^Yrji  f\ dYiji 

The  terms  with  coefficients  of  a  drop  out.  We  thus  can  say 
dXRji  A  dXiji  =  dYRji  A  ^Yiji  - 

without  having  to  keep  track  of  the  other  terms.  Note  that 

Xjj  =  Y,j  +  2Re(a*yS,]  +  [aj'  T;, 

is  real.  Thus 


dXjj  =  dXRjj-\-  dXijj=  dYRjj+  dYijj  + 


and  dXRjj  =  dYRjj  by  the  same  reasoning. 

Combining  all  the  wedge  products,  we  get  [dX)  =  (dY'),  and  thus  \  J(X  —*  y^)|  = 


1  and  \J{Y  A')!  =  1. 


Proposition  35  Let  X  =  BY be  a  transformation  of  complex  variables 
between  X  and  Y.  Let  Y  —  —Y^  and  lei  both  X  and  Y  be  in  C”^".  Let 
B  G  be  an  elementary  transformation  matrix  B  =  E2  = 


The  matrix  B  is  shown  here  with  the  constant  a  in  column  i  and  row  j.  For 
some  conformable  matrix  Z ,  BZ  has  the  effect  of  multiplying  row  i  of  matrix 
Z  by  the  constant  a,  and  adding  that  result  to  rou)  j.  Then 

\JiX-*Y)\  =  ]=\JiY-^X)\ 

Proof.  The  proof  is  almost  the  same  as  for  lemma  9.  The  matrix  .V  looks 
slightly  different  because  it,  too,  now  is  skew-Ilermitian.  What  is  different  is 
that  the  sign  of  the  coefficient  of  terms  containing  a  may  be  different.  For 
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terms  having  an  a  that  are  not  on  the  diagonal,  the  wedge  product  including 
the  associated  variable  will  have  been  computed  earlier.  The  noticeable  change 
occurs  on  the  diagonal.  The  cell  of  interest  is  of  the  form 

Xii  =  >^Rii  +  iXl,i  =  Y,i  +  -  “Y-i  +  \af  n 

for  i  <  j.  We  know  Yjj  and  Yu  are  imaginary  numbers.  Of  interest  is  that 

a^Yji  -  aY-i  =  2i  Im(a*rj,) 

is  also  imaginary.  If  i  >  j  then  the  sign  will  reverse.  Since  we  are  interested 
only  in  the  absolute  value  of  the  Jacobian,  we  do  not  need  to  keep  track  of  the 
signs  in  the  wedge  product.  So,  the  term  Xu  is  imaginary. 

Because  all  previous  terms  in  dXjj  have  been  included  in  a  wedge  product, 
we  can  say  dXu  =  dXm  =  dYm.  Combining  all  the  wedge  products,  we  get 
{dX)  =  {dY).  Thus 

\J{X^Y)\  =  \J{Y^X)\  =  l 

□ 
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Example  4  For  the  case  of  — 

Y»  =  Y 

€  M, 

,(C)  and  B  =  I  +  06463 

where 

a  £ 

c, 

we  get  BY B^ 

/ 

\ 

Yr, 

-y2\ 

—Y* 

^31 

—Y*  —  a*Y* 

^41  “  ^21 

—Y* 

^51 

Y21 

Y22 

-y32 

—  H2  +  <2*H2 

—Y* 

•'52 

= 

Y3, 

Y32 

Ha 

“Ha  +  Q*H2 

—Y* 

^53 

V4I  +  0^21 

Y42  +  aY22  Y43  —  aV^2 

Y44- 

-  0H2  d"  ®*H2  +  ^22 

-^4 

Ysi 

^52 

Ha 

V54  +  0.*Y52 

Hs  ^ 

Compute  {dX) 

in  the  order  of 

dXii ,  dX2i ,  dX22 

,dX3i 

,  dX32, 

dX-33 1  dX^i ,  dX^i ,  dXa ,  dX^n , 
dX^i ,  dX52,  dX^s,  dX54,dX$s 

Taking  advantage  of  dX^i  A  dXki  =  0,  and  following  this  order,  gives  us  a 
simple  computation  yielding  [dX)  =  {dY). 

Proposition  36  Let  X  =  BYB^  be  a  transformation  of  complex  variables 
between  X  and  Y.  Let  Y  =  diag(yi,  •  •  • ,  Yn),  and  let  X  ,  B  E  M„(C)  where  B 
is  the  elementary  transformation  matrix  B  =  I  +  aeicj.  The  constant  a  E  C 
is  a  complex  number  and  Bij  =  a,  where  i  and  j  are  fixed.  Then 

\J{X  -*Y)\  =  l  =  \J{Y  ->  X)1 
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In  the  equation  above,  only  two  off-diagonal  elements  are  non-zero.  In  evalu¬ 
ating  the  wedge  products  we  have 

dXr;\---f\dX,.x=dYrf\---^dYi., 

where  1  <  j  <  i  —  1.  Thus,  when  we  get  to  including  dXi,  we  see  that  dXj  has 
already  been  accounted  for.  Therefore,  dXi  =  dYi  and  \J{X  y)|  =  1.  Thus 

|j(r^x)|  =  i.  □ 

Lemma  10  Let  X  =  BY be  a  transformation  of  complex  variables  between 
X  and  Y,  Let  Y  =  Y^ ,  and  let  X,Y,B  6  Let  B  be  an  elementary 

transformation  matrix 
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The  diagonal  of  matrix  B  has  a  zero  at  positions  i  andj.  Said  differently,  Bn  = 
0  and  Bjj  =  0.  B  has  off-diagonal  ones  in  positions  (i,j)  and  {j,i),  so  that 
Bij  =  1  and  Bji  =  1.  For  some  matrix  Z,  BZ  has  the  effect  of  interchanging 
rows  i  and  j.  The  Jacobian  of  this  transformation  is  \J{X  — +  V^)|  =  1  and 
\J{Y  — ^  X)|  =  1.  This  theorem  was  motivated  by  Deemer  and  Olkin’s  proof 
of  their  theorem  3.7  [67],  and  by  Stewart  (p.  f3)  example  f.26.2  [259]  which 
addresses  operator  matrices  for  a  real  matrix. 

Proof.  JV:  =  BYB^^  = 


Each  element  in  X  is  an  element  in  V''.  (dX)  =  {(lY).  Therefore  |./(.V  — >  y'’)|  = 
1  and  consequently  \  J{Y  —*  X)|  =  1.  □ 

Lemma  11  Let  X  =  BY B^^  be  a  transformation  of  complex  variables  between 
X  and  Y.  Let  Y  =  —Y^,  and  let  X,Y,B  G  C"’'”.  Let  B  be  an  elementary 
transformation  matrix  that  interchanges  rows.  Then 


l.;(X-^V)l  =  1  =  |./(V^.V)1 
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Proof.  B  is  merely  a  permutation  of  two  of  the  vectors  in  (ei,  62,  •  •  • ,  e„). 
Thus  BY B^  is  merely  a  symmetric  shuffle  of  the  elements  of  Y.  At  worst, 
you  only  get  sign  changes  in  the  wedge  products.  We  are  only  interested  in 
the  absolute  value  of  the  Jacobian,  so  the  sign  changes  are  of  no  interest.  □ 


Example  5  Let  —Y^  —  Y  £  MslC)  and  let  B  =  (61,64,63,62,65).  Then 

BYB»  = 


->31 

-Y* 

^51 

F41 

^44 

K.3 

K42 

— y* 

•*^54 

P31 

—  Y* 
^43 

^3 

^32 

— 

^53 

K21 

-y42 

“^2 

P22 

-^2 

^51 

Pm 

P53 

^2 

Proposition  37  Let  X  —  BY B^  be  a  transformation  of  complex  variables 
between  X  and  Y.  Let  Y  =  diag(y^i,  >2,  ■  •  • ,  K)  and  let  X,Y,B  €  C"^".  Let 
B  be  an  elementary  transformation  matrix 


B  =  I-  e,ej 


e-jcJ  +  €,cf  +  e^ef 


Then 


\J{X  V')l  =  1  =  )./(V  ^  .V)| 
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Except  for  a  sign  change  due  to  the  permutation,  the  wedge  products  are 
identical.  Thus  the  Jacobians  are  the  same. 

\J{X^Y)\  =  1  =  \J{Y^X)\ 


Theorem  36  Inverses  of  Elementary  Transformations. 
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Table  C.3.  Elementary  Operator  Matrices 

Lemma  Operation  Operator  Remark 

7  Scaling  matrix.  Ey  =  I  +  [a  —  1)(5„  a  is  in  position  i 

9  Rowj  i—  Rowj  +  aRoWi  E2  =  I  -{■  aSj^i  a  is  in  position  (j,  ^) 

10  Swap  Rowi  and  Rowj  E^  =  cr,j(ei,  ■  •  • ,  €„)  Swap  rows  of  1 

Let  E  denote  the  inverse  of  E.  (This  unusual  notation  is  of  temporary 
value  in  the  conclusion.  Once  the  conclusion  is  made,  the  bad  notation  can 
be  forgotten.)  Define  elementary  operator  matrices  E\,E2,Ee  as  in  table  C.3. 

The  exhaustive  formula  for  each  is  given  in  the  lemma  indicated  in  column 
1.  (Tij{-)  is  the  permutation  that  exchanges  elements  i  and  j  of  the  argument. 
With  these  definitions,  we  note  that  the  following  relationships  hold  between 
the  elementary  matrix  and  its  inverse. 

El  =  {Ei)~^is  like  Ei  with  a  reciprocal  of  the  multiplication  constant,  i 

E2  =  (£'2)“'^  is  like  E2  with  the  sign  changed  on  the  multiplication  constant,  —  a 

£3  =  (£3)-'  =  £3 


459 


An  explicit  definition  follows. 
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Thus,  each  elementary  transformation  has  an  inverse.  The  proof  is  by  simple 
matrix  multiplication. 

Proof. 

EiEi  =  /,  E2E2  =  /,  E3E3  =  / 

□ 
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Theorem  37  Let  A  €  >  n,  be  a  matrix  over  field  T  (real  or  com¬ 

plex),  of  rank  r  <  n.  Let  Ei  be  an  elementary  transformation  matrix  that 
multiplies  a  row  by  a  constant  when  A  is  premultiplied  by  Ei.  Let  E2  be  an 
elementary  transformation  that  adds  to  one  row  some  constant  multiple  of 
another  row,  when  A  is  premultiplied  by  E2.  Let  E3  interchange  two  rows  of 
A  when  A  is  premultiplied  by  E3.  Let  the  notation  of  a  superscript,  such  as 
EfA,  denote  that  matrix  A  is  premultiplied  by  a  set  of  k  different  elementary 
transformation  matrices  of  type  i.  Then 

( 

Ir  0 

ipn—r  rpr  rpn  a  ipn—r  _ 

£^3  t/2  -^£^3  — 

^  0  0 

for  appropriate  choices  of  {E^}.  The  concept  is  simple  and  the  proof  is  tedious 
and  thus  omitted. 


tween  X  and  Y.  Let  Y  =  Y^,  and  let  B,X,Y  €  C"’'"  and  let  rank(5)  =  r. 
Then  \J{X  — »  F)|  =  |detB|^"  if  r  =  n,  and  is  zero  if  r  <  n.  Likewise, 
\J{Y  ^X)\  =  IdetBr^". 
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Proof.  From  corollary  6,  B  can  be  written  as 

D  _  fin  fir  fin— r 

tS  —  tj2  £^3 


0  0 


Er 


where  the  mark  above  the  symbol  indicates  a  matrix  inverse,  the  superscript 
indicates  the  number  of  such  matrices,  and  the  subscript  indicates  the  type  of 
elementary  transformation  matrix.  Then 


V  _  f^T*  E'n—r 

—  J^2  -C/j  -^3 

(  \ 
Ir  0 

Er^Y  {Er^f 

^  4  0  'l 

0 

0 

0 

0 

(4”-')"  (£[)"  (bj)" 


Now,  apply  the  theorems  that  describe  the  Jacobians  of  individual  elemen¬ 
tary  transformations,  noting  the  pattern  as  suggested  by  Deemer  and  Olkin 
theorem  3.7  [67].  The  subscript  to  the  left  of  a  matrix  in  the  notation  to  follow 
is  merely  an  index. 


F3  ={n-rE^)  ■  •  •  (2^3)  ilE3)YUEsf  (2F3)"  •  •  •  {n-rEsf 

' - V - ' 

Jl=l 

' - V - ' 

J2  =  l 

Jn-r  =  l 


|J(F3^m=^n-r---J2Jl  =  l 


/  \ 

(  \ 

4  0 

4  0 

^3 

^0  0^ 

0, 

By  a  previous  theorem. 


lAYr 


n)|  = 


1,  if  r  =  n 
0,  otherwise 
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If  \J{Yr  — »■  Vs)!  =  0,  then  \J{X  — >  Y)\  =  0  because  IJ(X  — >  F)|  is  the  product 
of  all  the  intermediate  Jacobians.  By  lemma  7  we  get 


Yl  =irEl)---(2Er)ilEt)YrUEif  (2^1)" 


\H 


\H 


•^2=I^>2|^ 


Thus 


Jr=|6r|' 


r  r  ’’ll 

ij(y,-.y;)i=n-'.  =  ni*’.r=n 

»=i  j=i  «=i 


a, 


where  a,  is  the  complex  constant  multiplier  appearing  in  the  diagonal  of  the 
elementary  transformation  of  type  Ei.  Finally, 


X  =(„^2)  •  •  •  (2^2)  UE2)Y^{^E2f  {2E2f  •  •  •  {nE2f 

J,=l 

' - ' 

J2=l 

' - V - ' 

«/n=l 


Therefore 


\J{X  F,)l  =  1 

\JiX  ^  Y)\  =  \J{X  ri)! . \J{Y,  Yr)\ •  \J{Yr  ^  Ys)\  •  iJCn  Y)\ 

n|6.r”,  if  r  =  n 
1=1 

0,  otherwise 
Now,  consider  the  determinant  of  B. 


det  B  =  det[E'iE[Er" 


I  \ 

Ir  0 


0  0 


Er^] 
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=  det(£2")  det{E[)  det{Er")det 


V 


0  0 


det(4"-0 


'  •> 

1,  if  r  =  n 

n^.  (-irM 

\:=1  / 

0,  otherwise 

1 

Ubi, 


if  r  =  n 


0,  otherwise 


(det  5)(det  By  =  |det  Bf  =  JJ  bib’  =  J]  \bif 

»=1  i=l 


Let  B  be  of  full  rank  r  =  n.  Then 


|J(A:  ^  K)|  =  n  |i,P“  =  IdetBl"" 

t=l 

Also,  |J(F^  JV:)|  =  Idet^r^". 

Note  that  this  theorem  is  for  matrices  that  are  unstructured.  When  B 
has  a  special  structure,  such  as  being  triangular,  then  that  structure  must  be 
accounted  for  in  determining  the  Jacobian. 


Corollary  7  Let  X  =  BYB^  be  a  transformation  of  complex  variables  be¬ 
tween  X  and  Y.  Let  Y  =  Y^,  and  let  X,Y,  and  B  be  in  C”^”.  Let  B  be 
unitary.  Then 

\J{X  -  F)|  =  1  =  \J{Y  X)| 

Proof.  From  theorem  38,  \  J{X  —*  y^)|  =  |det5p” .  Recall  that  the  deter¬ 
minant  of  any  unitary  matrix  is  1 .  Therefore 


\j{x  ^  Y)\  =  1  =  |j(y  x)\ 
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□ 

In  particular,  when  considering  the  eigenvalue  decomposition  X  = 
we  see  that  1J(A^  A^)|  =  1  and  |J(A^  —>■  A")!  =  1  where  17  is  a  fixed  unitary 

matrix. 


Theorem  39  Let  X  =  BY be  a  transformation  of  complex  variables  be¬ 
tween  X  and  Y.  Let  Y  =  — and  let  X,Y,  and  B  be  in  Let 

rank(B)  =  r.  Then  \  J(X  ->  K)|  =  |det5|^"  and  \  J{Y  X)\  =  \detB\-^^ 
when  r  =  n.  When  r  <  n  then  \J{X  — >  K)]  =  0.  This  is  a  complexification  of 
Muirhead’s  theorem  2.1.7  [187],  which  is  stated  without  proof. 


Proof.  If  y  =  —Y^,  then  the  diagonal  of  Y  is  pure  imaginary.  Thus,  in 
following  the  proof  of  theorem  38,  we  note  that 

Ir  0 


/ 


D  _  ipn  ipr  rpn—r 

D  —  tj2  -^3 


0  0 


Er 


) 


where  the  accent  on  top  of  the  E  indicates  matrix  inverse,  the  subscript  indi¬ 
cates  the  type  of  the  elementary  transformation  matrix,  and  the  superscript 
indicates  the  number  of  matrices  of  that  particular  type.  Then 


V  _  IPT  rn— r 

A  —  £/2  xl/j  £/3 


/  N 

(  \ 

Ir  0 

Ir  0 

T 

Er’Y  (E3”-') 

0 

0 

0 

0 

(£3”-')"  (b;)"  (bj)" 


To  help  compute  the  Jacobian,  expand  the  matrices,  subscripting  the  specific 
matrices  on  the  left.  Let 


>3  =  (n-rEs)  •  •  ■  (2^3)  (1^3)  y  (1^3)"  (2^3)''  •  •  •  {n-rE:,)'' 
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From  lemma  11  we  know  that  the  Jabobian  \J{Y'3  Y)\  =  Now,  let 


We  know 


/  \ 

/  \ 

Ir  0 

V'3 

Ir  0 

.0  S 

*  J 

^0  0^ 

f.. 

if  r  =  n 

\J{Yr  -  ^3)1  =  { 


0,  otherwise 


Suppose  r  =  n.  Consider  Yi  next,  with  lemma  8.  Let 


Kx  =(.£i) •••  (2^1)  (1^1)  v; (2^,)" 


j,=i62r 

Jr=|6r|*" 


Thus 


lAn 


•^')i = = n 


t=l  t=l 


Finally, 


X  =  LE2)  •  •  •  {2E2)  (1^2)  Tl  (1^2)''  (2^2)"  •  •  •  {nE2)'' 

' - V - ^ 

J,  =  l 

We  see  that  \J{X  — »  Vx)!  =  1.  Putting  it  all  together. 


\J{X  Y)\  =  \JiX  ^  rx)|  •  |J(r,  ^  Jr)\ ■  \JiYr  ^  Ys)\  ■  |J(V3  -  Y)\ 

n  l&.P”  ,  for  r  =  n 

t=i 

0,  otherwise 

From  the  proof  of  theorem  38,  if  rank(iB)  =  n,  then  |det  Bf  =  f]  l^iP”  ■  Thus 

1=1 

\JiX  ^  Y)\  =  |det  and  \J{Y  ^  X)|  =  jdet  B\-^  .  □ 
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Discussion.  In  Muirhead’s  text  (pp.  58-59)  [187]  attention  is  restricted 
to  only  real  matrices.  Thus  the  diagonal  of  Muirhead’s  V  =  —Y^  is  zero. 
Compared  to  Y  =  Y^,  his  skew-symmetric  matrix  has  fewer  algebraically 
independent  variables.  The  effect  is  that  the  Jacobians  for  BYB^  for  the  two 
cases  are  different. 

The  complex  case  is  simpler.  The  Hermitian  Y  has  pure  reals  on  the 
diagonal,  whereas  the  skew-Hermitian  >'  has  pure  imaginary  numbers  on  the 
diagonal.  The  Hermitian  and  the  skew-Hermitian  matrices  have  the  same 
number  of  algebraically  independent  variables.  The  fact  that  the  Jacobians 
turn  out  to  be  the  same  in  the  complex  case  is  thus  not  inconsistent  with  this 
observation. 

Corollary  8  Let  X  =  BY B^  be  a  transformation  of  complex  variables  be¬ 
tween  X  and  Y.  Let  Y  =  —Y^,and  let  X.Y,  and  B  be  in  C”’'".  Let  B  be 
unitary  (F"  =  Then  \J{X  ^  T)]  =  1  =  \J(Y  ^  X)\ . 

Proof.  From  theorem  39,  \J{X  — >  y')|  =  |det5|^".  Since  B  is  unitary, 
det  B  =  1.  Therefore 


\J{x  -  r)|  =  1  =  |J{r  -  x)\ 


□ 


Theorem  40  Let  X  =  y"*  be  a  complex  change  of  variables  between  X  and 
Y.  LetX,Y  Then 

\J{X  ^Y)\  =  \detXf^ 

This  is  a  complexification  of  Muirhead’s  theorem  2.1.8  [187]. 

Proof.  This  is  a  complexification  of  Muirhead’s  proof.  X  —  Y~^  implies 
YX  =  I.  We  compute  the  matrix  differential  of  this  according  to  theorem  21 
to  find  that 

idY)X  +  Y{dX)  =  0 

This  implies 

(dY)  =  -YidX)X-^  =  -YidX)X-^  =  -X-\dX)X-^ 

By  theorem  38,  \J{Y  ^  X)\  =  |detA:-'|'"  .  Therefore  \J{Y  ^  X)\  =  jdet  Xp"" 
and  \  J{X^Y)\  =  Idet^l'".  □ 


Appendix  D 
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DISTRIBUTIONS,  PART  I 

D.l  Complex  Normal  Distribution  Introduc¬ 
tion 

A  derivation  of  the  probability  density  function  for  the  vector  complex  dis¬ 
tribution  is  given  by  Wooding  [293]  for  the  zero  mean  case.  This  is  the  form 
used  by  Goodman  [92].  The  most  complete  notationally  consistent  summary 
of  basic  results  readily  available  are  given  by  Anderson  (problem  3.64)  [26] 
and  by  Monzingo  and  Miller  (appendix  E.2)  [186].  I  strongly  recommend  that 
Goodman  [92]  be  used  as  the  source  reference  from  which  other  results  are 
constructed.  He  is  a  careful  author. 

Close  reading  of  the  literature  is  required  if  results  of  papers  are  to  be 
compared  on  an  equal  basis.  In  particular,  pay  attention  to  the  following 
issues.  Are  the  results  for  the  complex  case  being  presented  in  terms  of  real  or 
complex  variables?  Is  the  assumed  covariance  matrix  of  special  form,  or  is  it 
general?  Answers  to  these  questions  will  explain  the  variations  in  formulations 
of  the  characteristic  function,  as  well  as  possibly  other  results. 

The  key  to  understanding  the  complex  characteristic  function  is  that  each 
complex  random  variable  can  be  thought  of  as  two  paired  real  random  vari- 


ables.  Since  a  characteristic  function  is  a  special  type  of  expected  value,  the 
integration  of  the  associated  density  function  must  be  carried  out  over  all  the 
space  its  random  variable  is  defined  for.  Hence,  the  integration  must  be  carried 
out  over  the  entire  complex  plane.  Equivalently,  a  double  integration  over  R 
is  required. 

Not  all  vector  complex  normal  distributions  are  the  same.  We  are  con¬ 
cerned  about  a  very  special  complex  normal  distribution  which  is  motivated 
by  signal  processing  needs.  I  will  follow  the  explanation  provided  by  Goodman 
[92]. 


D.1.1  Definition  of  a  Vector  Complex  Random  Vari¬ 
able 

A  complex  normal  random  variable  is  a  complex  variable  whose  real  and  imag¬ 
inary  parts  are  bivariate  Gaussian  distributed.  Let  Z  =  X  -\-iY  he  a,  p- variate 
column  vector  complex  normal  random  variable.  A  p-variate  complex  normal 


random  variable 
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is  a  p-tuple  of  complex  normal  random  variables  such  that  the  vector  of  real 


and  imaginary  parts 

(  X,  ] 


U  = 


V  ) 

is  a  2p- variate  real  normal  random  variable  with  a  special  covariance  structure. 
Wooding  provided  the  following  explanation.  Consider 


Z„(t)  =  X„(t)  +  iFn(0 

This  can  be  written  in  the  form 

k 

where  the  coefficients  C^k  and  are  real.  This  complex  Fourier  series  arises 
in  numerous  fields,  particularly  in  theory  related  to  time  series.  Expanding 
this  into  its  real  and  imaginary  parts  yields 

^n{t)  =  E  [C'nit  cos{6k{t))  +  dnk  sin(0jb(O)] 
k 

^nit)  =  E  [C’„fcsin(0fc(O)  -  dnkcos{0k{t))] 

k 

The  Xi  and  Yi  are  in  phase  quadrature.  (Note:  Wooding’s  K„(<)  is  the  negative 
of  what  I  am  reporting  here.)  The  covariance  matrix  satisfies  the  following 
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relations. 


£{XM  =  e{YmY,} 


S{XM  =  -£{VmX„} 


When  £{Xm}  —  £{Ym}  =  0,  the  definition  of  the  complex  variates  will  not 
involve  Fourier  series  concepts. 


When  this  restriction  on  the  covariance  matrix  is  made,  reordering  the  ele¬ 


ments  of  the  real  variable  vector  representation  of  the  complex  vector  yields  a 


covariance  matrix  with  a  special  pattern.  The  vector 


has  a  normal  dis¬ 


tribution  with  mean  vector 


/ 


and  covariance  matrix  S  = 


I  G 


where  G  is  positive  definite  and  F  =  -F^  (skew  symmetric).  Then  Z  =  XYiY 
is  said  to  have  a  complex  normal  distribution  with  mean  ^  +  ifiy  and 

covariance  matrix 


S  =  f((Z-p)(Z -,,)»} 


where  H  is  the  Hermitian  (complex  conjugate)  transpose.  Note  that  S  is 
Hermitian  and  positive  definite.  We  can  express  E  also  as  the  sum  of  real  and 


imaginary  parts,  —  Q  F  iR. 
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D.1.2  Proof  that  E  is  Isomorphic  to  25 

This  is  Anderson  problem  3.64(a)  [26].  Let 


have  a  real  multivariate 


( 

\ 

( 

\ 

f^x 

G 

-F 

and  covariance  S  = 

K 

J 

^  > 

normal  distribution  with  mean  vector 

where  G  is  positive  definite  and  F  =  —F^.  Then,  let  Z  =  X-\-iY  have  a  special 
vector  complex  normal  distribution  with  mean  and  covariance 

matrix 

2  =  enz  -  ^)iz  - 

where  S  is  positive  definite.  Then 


S  =  e{{Z  -  ti){Z  -  ^l)«}  =  £{(X  +  iY  -nx-  itiY){X  +  lY  -tix-  itiYf} 

=  S{{X  -]-iY  -  fix  -  -  H'X  + 

=  £{{XX'^  +  iYX'^  -  fixX'^  -  ifiYX'^)  -  i{XY^  +  iYY^  -  fixY^  - 

-{Xfi^  +  iYfi^  -  fix  fix  -  if^Yfix)  +  i{Xfil  +  iY  fi^  -  fixfiy  -  if^Yfiy)} 

We  expand  and  group  all  the  real  terms  together  and  all  the  imaginary  terms 
together. 


=  e{{XX^  +  iYX^  -  fixX'^  -  ifiyX'^)  -  (Xfi](  +  iY fi\  “  t^xfix  “  if^Yfix) 

-i{XY'^  +  iYY"^  -  fixY'^  -  ifiyY"^)  +  iiXfiy  +  iY fiy  -  fixfiy  -  if^’Yfiy)} 

=  SiXX"^  +  iYX^  -  fixX'^  -  ifiyX"^  -  Xfix  -  iYfix  +  fixUx  +  ^f^Yfix 
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-iXY^  +  YY'^  +  ifxxY'^  -  fiyY'^  +  iXfi'^  -  Yfi'^  -  +  fiyfil} 

=  S{XX'^  -  nxX^  -  Xfix  +  fixfix  +  -  t^YY"^  -  Yfiy  +  fiyfiy] 

+iS{YX^  —  fiyX^  —  Y  fix  +  —  XY^  +  nxY^  +  Xfiy  —  fixfiy} 

=  Q  +  iR 

We  look  for  a  more  compact  expression  for  the  real  terms  and  the  imaginary 
terms. 


=  £{(X  -  lix)X^  -(X-  +  {Y-  ^y)Y^  -(Y-  wK) 


+,S{(Y-i^r)X^-{Y-^r )f‘l  -  i.X  -  ,,x)Y'^  +  {X  -  ^x )/<? ) 


=  £{(X  -  )ix)(X^  -  pS )  +  (1'  -  t‘y)(Y^  -  l‘l)] 


1  _  .J. 


+i5{(y  -  ^rKX-^  -  A)  -  (X  -  «)(/’■  -  4)} 
=  ^|(  (x-m)  (y-w) ) 


''(X-„f '' 


+i£ 


( 


{X  -  fix)  {Y  -  fly) 


] 


\  {Y-fiY)^  ) 
'  -{Y-^yf^ 


V  (X-M^  ) 

=  ^|((x-w)  (y-w) )  ( (X-m)  {Y-Iiy)'^  I 
+*f|((x-M)  (y-w) )  ( -(y-^y)  (x-m))  I 


—  Q  +  iR 
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We  recombine  terms  to  see  if  we  can  get  further  simplification  in  our  notation. 


=  ^{  (y-w)) 

X  ((A--M)  (y-/.K))  +>(-(y-;<y)  (X-m))  I 

=  ^{  ((Jf-/**)  (y-w)) 

X  (X  - m)  -  ny  -  1‘y)  (y-w)  +  i(J>c-M) )  I 

Now,  express  the  complex  vector  as  a  partitioned  real  vector  and  compute 
the  covariance  matrix. 


G  -F 


F  G 


A  B  (X-Mx)  (X-Mx) 

=  e< 

CD  (Y-I^r)  I  I  (y-w) 


\  (X-m)  / 

=  ^1  \(X-ltxY  {Y-l‘yf 

I  (y-w)  /  ^ 


(X  -  ,,x)(X  -  [X  -  iix)(Y  -  l^yf 
{Y  -  iir)(X  -  ^xf  (Y-iiyW-liyf 

(X  -  i,x){X-^  -  III)  (X  -  m)(Y'^  -  III) 

(Y  -  ,iy)(XT  -  III)  (Y  -  iiy)iY^  -  III) 


XX-r  -iixX'^-Xiil+iixiil  XY-r  -  iixY-r  -  X III  ly  iixiil 
YXT  -iiyX^-  Yiil  +  Iirnl  yy’’  -HyY^-  Ylil  +  liylil 
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From  the  problem  statement,  the  covariance  matrix  must  have  the  special  form 
such  that  A  =  D  and  C  =  ~B,  Then  A  =  D  implies 

e{XX^  -  iixX'^  -  Xiil  +  tixtil)  =  -  Ytil  + 

and  C  =  —B  implies 

S{YX^  ~  hyX^  -  Yfil  +  }  =  ~€{iXY^  -  fixY^  -  X/x?  +  /xx/^y)}  = 

=  €{-XY'^  +  fixY'^  +  Xfil  -  fixlijr) 

Observe  that  C  =  and  by  special  requirements  we  impose  C  =  —B.  This 
is  possible  since  B  and  C  are  square  matrices  of  the  same  dimension.  The 
special  condition  is  possible  in  signal  processing  applications. 

To  demonstrate  the  required  equalities,  we  know  from  the  requirements 


(  \ 

(  \ 

G  -F 

A  B 

that  we  need 

2G  =  A  +  D 
2F  =  C-B 

Thus  we  compute 

2G  =  A+D  =  e{XX'^-tixX^-Xfil+nxfil  +  YY'^-fiYY'^-Ynj:+fiYfil} 
Q  =  €{XX^  —  fixX^  —  Xfix  +  +  YY^  —  hyY^  —  Yny  A-  /xy/xy } 

and 

2F  =  C-B  =  S{YX'^ -tiYX'^ -YnlA^^Y^i]c-XY'^  ^HxY'^  AX^ll-^tx^liY} 
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R  =  S{YX'^  -  fxyX^  -  Yfil  +  fivfil  -  XY'^  +  ^xY'^  4-  Xfx'f  -  fixfi^ 

We  observe  that  Q  =  2G  and  R  =  2F.  From  this,  we  know  that  any  element 
in  E  will  be  double  the  value  of  the  corresponding  element  of  S.  So,  there  is  a 
one-to-one  mapping  between  every  element  of  S  and  E.  We  thus  say  that  E  is 
isomorphic  to  2S. 

D.1.3  Density  of  Vector  Complex  Normal  Distribution 

Anderson  problem  3.64(d)  [26]  defines  the  probability  density  function  for  the 
p-variate  special  vector  complex  normal  distribution  CAp(/x,E)  as 

The  density  for  the  special  vector  complex  normal  distribution  is  slightly 
different  from  the  real  variables  case  in  three  ways.  First,  the  exponent  uses 
the  Hermitian  transpose.  Second,  the  exponent  term  is  not  divided  in  half. 
Third,  the  term  preceding  the  exponent  does  not  have  a  square  root. 

Recall  that  5  is  a  2p  x  2p  matrix.  Compared  to  5,  each  element  of  E  is 
multiplied  by  2.  From  the  theory  of  determinants,  we  know 

[det(E)]2  =  det(25)  =  22pdet(5) 


which  implies 


det(E)  =  2'’[det{5)]'/" 
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or 

[det(5)]'/''  =  2-'’det(S) 

This  result  is  different  than  theorem  2.5  of  Goodman  [92].  The  27r  factor  in 
the  leading  term  of  the  density  function  for  the  real  case  2p- variate  expres¬ 
sion  is  raised  to  the  ^  =  p  power.  Thus,  the  complete  leading  term  in  the 
denominator  is 

(27r)P2-P  det(E)  =  7rPdet(S) 

D.1.4  Characteristic  Function  for  the  Standardized  Vec¬ 
tor  Complex  Normal  Distribution 

Derivation  from  the  Standard  Univariate  Complex  Normal  Density 
Function 

Let  the  random  variable  z  =  x-\-iy  have  the  univariate  special  complex  normal 
distribution  with  zero  mean  and  unit  variance  C^Vi  (0, 1).  The  density  function 
is  given  by 

f(z)  = 

'  ir  TT  TT  TT 

Let  transform  parameter  be  f  =  y^iT.  Tne  characteristic  function  is  computed 

by 

$,(t)  =  5{exp[fRe(t"z)]}=  T  T 

7—00  7—00 
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To  solve  the  integral  note  that  the  exponent  can  be  placed 

into  the  form  of  a  perfect  square  by  observing  the  following  standard  trick. 

— (x  —  —  ixT)  —  =  irjx  — 

£t  ^  ^ 

which  implies  that 

irfx  —  x^  =  — (x  —  ~ 

Therefore 

/OO  .  ^  1  1 

^trix-x  ^2j  /  exp[— (x  —  -iT))^]dx 

-oo  4  J— OO  2 

The  integral  is  in  the  form  of  f  exp[—z^]dz.  Note  that  the  function  exp(-z^) 
is  analytic  everywhere  in  the  complex  plane.  Thus  f  exp[— z^]d2  =  0.  Consider 
the  contour  given  in  figure  D.l. 

The  closed  path  of  integration  that  begins  on  the  real  axis  at  —K,  follows 
the  real  axis  to  4-A",  descends  parallel  to  the  imaginary  axis  in  the  negative 
direction  to  K  —  transits  parallel  to  the  real  axis  in  the  negative  direction 
to  —K  —  ^iT/,  and  the  returns  to  the  starting  point  by  ascending  parallel  to 
the  imaginary  axis  to  the  real  axis  again  at  —A'.  Evaluating  the  integral  along 
this  contour  yields 


^'dz  =  0  = 


dz 


'^dx  +  /  ^ 

h  Jk 


J-vl2 
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Figure  D.l.  Integration  Contour  to  Get  Characteristic  Function 

where  a  different  appropriate  change  of  variables  has  been  made  for  each  in¬ 
tegral  to  simplify  the  limits.  In  the  second  integral,  let  u  =  — —  A').  In  the 
third  integral,  let  u  =  2  -|-  In  the  last  integral,  let  u  =  —i{z  -I-  A'). 
Examine  what  happens  as  K  goes  to  infinity. 

lim  I =  lim  I 

/r-*oo  Jo  ft— ‘00  Jo 

<  lim  =  0 

ft'-»ooJo  '  I  '  ' 

Similarly, 

lim  I  f  =  lim  / 

ft’-.oo  |J-t)/2  I  ft'— 00  J-tj/2 

lim  I  r  <  lim  /”  du  =  0 

K-fOo\J-r]/2  ft'— ooJ-t)/2  *  ' 
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The  following  integral  is  known  by  many,  e~^^dx  =  /  is  evaluated  by  look¬ 
ing  at  its  square  and  performing  a  rectangular  to  polar  coordinate  conversion. 
I  have  not  seen  this  trick  used  on  any  other  integral. 

p=(  r°°e-^Ux)  (  r°^e-^dw)  =  T" 

\J'-OQ  /  \J  — OO  /  J  —  OO  4/  — OO 

Let  X  =  rcosO,  w  =  rsind,  and  dx  dw  =  rdrdd.  Then 

=  4  dr  =  4  (0  e-’'V  dr 

t  o  o  1 00 

=  —IT  I  e"’’  (—2r)dr  =  —ire'’  =  — 7r(0  —  1)  =  tt 
Jo  10 

Therefore,  I  =  e~^^dx  =  y/v. 

Substitute  these  individual  results  back  into  the  equation  for  §e~’‘^dz. 
Switch  the  limits  on  the  remaining  integral,  thus  changing  its  sign. 

y  e-^"dz  =  0  =  y/^  +  0-  -t-  0 

Switching  notation  of  dummy  variables  from  u  to  x,  we  get 

r  e-^^-2'v)^dx  =  v/^ 

J  —  00 

Substituting  this  result  yields 

/OO  J 

exp[z?7x  —  x^]dx  =  ^/^  exp[— -rj^] 

-OO  4 


Continuing  the  back-substitution,  we  get  the  characteristic  function 

^^(0  =  ^  (v^exp[-^7;2]j  ^N^rexpf-^T^]^  =  i;rexp[-^(r;^  -f  r^)] 
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Finally,  for  the  univariate  standard  special  complex  normal  distribution,  the 
characteristic  function  is 

^^(0  =  exp[-i  lt|^]  (D.2) 


=  exp  i  ^  {n^Cv  -  "■  ^  '  *  t^Ft]  +  T^(7r) 


Regardless  of  whether  the  distribution  is  expressed  as  paired  real  vectors  or  by 
complex  vectors,  the  density  function  and  the  characteristic  function  should 
always  have  the  same  value  for  the  corresponding  identical  parameters.  With 
this  in  mind,  let  us  examine  similar  forms  with  complex  variables.  Let  T  = 
T}  +  ir.  Then 

T^fi  =  {ri^  -  +  i^iy)  =  +  t'^ fiy  -  ir'^Hx  + 

Thus  Re(T^/i)  =  Now  examine  the  covariance  term. 

T^ET  =  (77^  -  ir^)(g  +  iR){r)  +  ir) 

—  V^Qv  +  t^Qt  +  t^Rt)  —  t)^Rt  +  i[r}^RT}  +  t^Rt  —  t^Qt]  + 

Recall  that  we  proved  R  =  2F  and  Q  =  2G.  Then  T’^ST’  = 

2ri^Gr}  +  2t^Gt  +  2t^  Frj  —  2t]^Ft  +  i[2r}^FTj  +  2t^Ft  —  2T^Gr]  +  27}^  Gt] 

Comparing  this  with  it  is  seen  that 

\  Re(r"Er)  = 

Since  S  is  Hermitian  positive  definite,  by  theorem  119  it  can  be  factored  as 
S  =  CC".  Then 


r"sr  =  T^CC^T  =  {C^T)^{C"T) 


which  is  real.  Thus  T^T,T  —  RelT^ST],  and  therefore  the  characteristic 
function  for  the  special  vector  complex  normal  distribution  is 


This  differs  from  Anderson’s  result  by  the  ^  in  the  covariance  term. 
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D.2  Matrix  Complex  Normal  Distribution 

The  matrix  complex  normal  distribution  describes  the  random  data  whose 
quadratic  form  produces  the  complex  Wishart  distribution.  In  order  to  define 
and  describe  the  properties  of  the  complex  Wishart  distribution,  the  definition 
and  properties  of  the  complex  Gaussian  distribution  must  be  understood.  The 
material  that  follows  is  a  complexification  of  Arnold’s  Section  17.2  [31].  I 
have  also  used  characteristic  functions  where  Arnold  used  moment  generating 
functions. 


D.2.1  Definition  of  the  Matrix  Complex  Normal  Dis¬ 
tribution 

Let  Z  =  (Z,j)  be  an  n  X  p  random  matrix  such  that  the  {Z,j}  are  indepen¬ 
dent  and  each  Z,j  is  distributed  according  to  the  univariate  complex  normal 
distribution  with  zero  mean  and  unit  variance.  Symbolically,  we  denote  this 
as  Z,j  ~  CNi{0, 1).  The  characteristic  function  from  equation  D.2  is 

(D.4) 

For  Z  =  (Zij)„xp  independent  and  identically  distributed,  then 

=  n  ‘r(r''r)|  (D.5) 

j=l i=l  ^  ^ 
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Likewise,  the  density  function  (from  equation  D.l)  is  given  by 

/(Z)  =  n  iij  ~  exp|-tr(Z''Z))  (D.6) 

j=l  «=1  ' 

Let  A  e  B  e  //  €  and  Y  =  AZB  +  n.  The  symbol  fi 

in  statistics  is  often  reserved  to  refer  to  the  average  or  mean  of  a  distribution, 
and  it  is  often  (but  not  necessarily)  a  parameter  in  the  distribution  functions. 
Then  Y  €  and 

^y{T)  =  ^AZB+^iT)  =  exp  {i  Re  [tr(r"/x)] }  $z(A"rB")  (D.7) 

=  exp  {i  Re  [tr(r"/i)] }  exp  tr  [{A^TB^fiA^TB^^)]  | 

=  exp  Re  [tr(r"/i)]  -  ^  tT(BT” AA^TB^)^ 

=  exp  Re  [tr(r«/f))  -  ^  itiT^AA^TB^B)^ 

Let 

i  =  AA"  and  S  =  B^ B  (D.8) 

These  symbols  have  special  meanings  in  statistics.  The  symbol  S  is  the  more 
frequently  used  symbol,  and  it  represents  the  covariance  matrix  of  a  distribu¬ 
tion.  When  more  than  one  covariance  matrix  is  being  discussed,  the  symbol 
H  is  often  the  symbol  of  choice.  In  the  case  of  the  matrix  complex  normal 
distribution,  Arnold  [31]  remarks  that  it  helps  to  think  of  E  as  representing 
the  covariance  between  the  rows  of  Y,  and  E  as  representing  the  covariance 
between  the  columns  of  Y.  With  these  symbols  defined,  then 


^y{T)  =  exp  Re  [tr(r"p)]  -  ^  tr(r"ErE)| 


(D.9) 
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Y  has  a  special  complex  matrix  normal  distribution  with  parameters  //,  E, 
and  £.  We  denote  this  by  K  ~  E).  Note  that  this  is  not  unique. 

For  arbitrary  scalar  a  6  C,  it  is  true  that 

Y  ~  CAr„,,(//,E,E)  =  CiV^.,(/i,-E,aE).  (D.IO) 

a 

The  parameter  E  is  often  assumed  known,  and  usually  assumed  to  be  the 
identity  matrix  Note  the  dimensions  on  the  parameters:  /Xmxr,  — mxm,  and 
Srxr- 

Proposition  38  If  z  ~  CA^i(0, 1)  then  z*  ~  CNi(0, 1). 

Proof.  Let  z  =  x  +  iy.  Then  z  ~  CiVi(0, 1)  implies  the  expected  value 
S{x  +  iy)  =  0  which  implies  and  £{x)  =  0  and  £{y)  =  0.  We  also  see  that  the 
variance 

var(2)  =  £{{z  —  0)(z  —  0)*]  =  S{zz*) 

=  ^[(x  +  iy){x  -  ty)]  =  €{x‘^  ■+ =  1 
6{x)  =  0  and  S{y)  =  0  implies  S{x  —  iy)  =  £{z*)  =  0.  Then 

1  =  S\{x  +  iy){x  -  iy)]  =  £{zz‘')  =  £\z‘‘{z*y] 

=  S[{z*  -  0)(2*  -  0)*]  =  var(z*) 

Therefore  var(2)  =  var(2*).  This  completely  defines  the  distribution.  We  con¬ 
clude  that 


2~CNi(0,l)^2*~CiV,(0,l) 
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Proposition  39  LetY  ~  C^TO,r(/^,H,E).  LetY  =  (Yij),  =  (//,.,),  H  = 

CiTld  S  THcTI  Si^Y'ij'^  —  f^ijy  ClTld  COv(V^j ^  )  — 

.  This  is  the  complexification  of  Arnold’s  theorem  17.1  [31]. 

Proof.  This  is  a  complexification  of  Arnold’s  proof.  Let  Zij  ~  CN (0,1). 
Then  S{Zij)  =  0,  var( Z^j)  =  1,  and  cov(Zij,  Zi>jt)  =  0  unless  i  =  i*  and 
j  =  j*.  Define  matrices  A  and  5  by  E  =  AA^,  S  =  B^B.  These  factorizations 
exist  for  Hermitian  positive  definite  E  and  S  as  proven  by  theorem  119.  Let 
Y  =  AZB  +  p.  Then  Y  ~  CNm,ril^,^,Y,)  by  construction.  Element  Yij  is 
given  by 

Yv^EE^ikZksB^j+Pi,  (D.ll) 

k=l  »=1 

Then 

cov{Yii,Yi.i.)  =  e {\Yi  -  e{Yi,)] iY.i.  -  £(y;.,.)i-)  (d,i2) 

We  consider  the  details  of  one  of  the  arguments  and  note  that  the  other  argu¬ 
ment  has  similar  results. 

ikZksBsj  pij  Z]  ^ikZksB^j  -h  Pij 

fc=ls=l  U=1  S=1 

=tt  ^ikZksBsj  -|-  Pij  ^ik^  {Zks}  •+•  pij 

fc=l  »=1  fc=l  »=1 

which  implies 

-  e{Yii)  =  E  E  AikZksB,^  (D.13) 

fc=l  3=1 

We  use  this  to  evaluate  the  covariance  term  of  D.12. 

cov(y;j,yi.j.)  =  s  {  EE  AifcZfcsBjjj  |E  Z  Aj^k-Zk's-Ba-j* 

I  U=i  »=i  U*=i  s»=i 
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Aik Baj X  E  {ZksZ^,^, } 

=0  except  when  k'=k,  »•=* 

=  1:1;  A,i.B,iAUBr.  =  f;  A^,A-.,  •£  B„Br.  =  Eii-E',. 

fc=lJ=l  k=l  *=1 

Therefore, 

cov(  Vij ,  Yi*j, )  =  Ei,.  S'j.  ( R- 1 4 ) 

The  variance  term  is  simpler  to  compute. 

var(y;j)  =  cov(y;„  y„)  =  e„sv  =  e.,s,,  (d.is) 

since  elements  on  the  diagonal  are  in  R. 

By  this  theorem,  then,  the  covariance  between  two  elements  Yij  and  Yi»j» 
is  just  the  covariance  between  the  rows  i  and  i*  multiplied  by  the  conjugate  of 
the  covariance  between  the  columns  j  and  j*. 

D.2.2  Properties  of  the  Matrix  Complex  Normal  Dis¬ 
tribution. 

The  properties  studied  here  are  the  complexification  of  Arnold’s  theorems  17.2 
and  17.3  [31],  plus  some  corollaries  motivated  by  these  theorems. 

Lemma  12  Let  Z  ~  CA^m,r(p,H,  E).  If  r  =  I  and  E  =  cr^  (a  scalar),  then 

Proof.  The  characteristic  function  of  Z  from  equation  D.9  is  given  by 
4>z(7’)  =  exp  i  Re  (tr(7'",i))  -  ^  lr('r"=r£) 


=ei:e  E 
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=  exp  Re  (tT{T^fi)^  —  ^  tr(r^[<7^H]T)j 

which  is  the  characteristic  function  of  CiVm(p,  cr^E)  by  equation  D.3.  This  is  a 
complexification  of  Arnold’s  theorem  17.2(a),  which  was  stated  without  proof. 

Lemma  13  Let  Z  ~  CA/m,r(p,H,  E).  If  a  is  a  scalar,  then 

aZ  ~  CNm,Tiotpi;\oif  =  CAr,„r(Q;/i,Q;*H,Q:S) 

=  CiV,„.,(a/z,aE,a*E)  =  CAr„,,(a/i,E,  |al'  E)  (D.16) 

This  is  a  complexification  of  Arnold’s  theorem  17.2(b),  which  was  stated  with¬ 
out  proof. 

Proof.  From  the  characteristic  function,  we  observe 

$„z(r)  =  S  {exp  \i  Re  (tr[r"(aZ)])] } 

=  S  (exp  [i  Re  (tr[(a*r)"Z])] }  =  ^^(qT) 

since  a  is  a  scalar.  We  continue  by  regrouping  scalar  q  to  obtain  the  results. 
Since  a  and  a*  are  scalars,  they  commute. 

$z(o*T)  =  exp  i  Re  (tr[(a*r)"/i])  -  ^  tr  ((a*r)"E(QT)E)] 

=  exp  i  Re  {tr[r^(ap)])  ~  i 

These  are  the  characteristic  functions  of  the  distributions  cited  in  the  results. 

Corollary  9  LetZ  ~  CNm,r{f^,E,'f:).  Then  Z»  ~  CiV, E,E).  This  is  a 
complexification  of  Arnold’s  theorem  17.2(c),  which  was  stated  without  proof. 
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Proof.  Let  Z  =  AXB  +  fi.  Then  Z"  =  B"X"A"  +  ^l»  where  X  ~ 
C7V(0,  /,  I).  The  characteristic  function  of  is  given  by 

^zm{T)  =  e  {exp  [i  Re  (tr(T"(B"A'">l"  +  /i")])] } 

=  exp  l^z  Re  (tr[T^ {exp  Re  {tr[i4^T^R^X^]jj  | 

=  exp  [i  Re  (tr[T"fi^])]^xn{BTA) 

=  exp  Re  (tr\T^fi  exp  -  tr[(RT/l)^(B7’A)]j 

=  exp  Re  {tr[r^p^])|  exp  ^  tr[A^r^B^RT’>l]j 

=  exp  i  Re  (tr[T^ exp  j^— ^  tr[T'^B^R7’74A^]j 

=  exp  Re  {tr[r^p^])]  exp  ^  tr[r^SrH]j 

which  is  the  characteristic  function  of  the  result,  where  H  =  AA^  and  S  = 
B^B  as  used  earlier. 

Theorem  41  (Very  important^  Let  Z  ~  CVm,r(^,H,  S)  where  the  matrix 
dimensions  are  Zmxri  fJ-mxr,  ^mxm,  S^xr-  Let  Y  =  AZB  +  u  where  the 
dimensions  are  T„xp,  >lnxm,  Brxpi  I'nxp-  Then 

Y  ~  CV„.p(«/  +  AfiB,  AEA^,  R"SR) 

This  is  a  complexification  of  Arnold's  theorem  17.2(d)  [31],  which  was  stated 


without  proof. 
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Proof.  From  a  practical  standpoint  for  the  future  use  of  other  people,  this 
is  one  of  the  most  important  results  in  this  thesis.  The  characteristic  function 
is  given  by 

<SfAZB+.{T)  =  S  {exp  [i  Re  (tr[T«( AZB  +  } 

=  exp  i  Re  {tr[r^j/])|  S  {exp  [i  Re  (tT[BT^ AZfj^  | 

=  exp  [i  Re  (tr[r"i/])]  ^z{A^TB”) 

=  exp  I*  Re  {tr[T’^j/])| 

X  exp  i  Re  {tT[{A”TB^f  ft])  -  ^  tr[(4"7’B")"E(/l"rB")S]] 

=  exp  Re  (tr[r"i/]  +  tr[BT^Aft])  -  i  tr[BT^ AZA^ T 

=  exp  i  Re  (tr[r"(i/  +  AftB)])  -  ^  tr[r^(/lE/l")T(J5"SB)]]  =  $y(r) 

(D.17) 

which  is  the  characteristic  function  of  the  result. 

Theorem  42  Let  Z  ~  CNm.rifi',^,  S)  where  the  matrix  dimensions  are  Zmxr, 
fimxri  Hmxnn  Srxr-  Partition  the  random  variable  and  parameters  as  fol- 

I  \ 

Ell  Si2 

lows.  Let  Z  =  (^1,^2),  p  =  {p\,p2)-,  and  S  =  where  Zi 

^  E21  E22  ^ 

and  Pi  are  m  x  ri  and  Eu  is  ri  x  ri.  Then  Zi  rs,/  CiVm,ri  1E11)  and 
Zi  ~  CNn,(r-ri){p2i^i^22)-  This  is  a  complexification  of  Arnold’s  theorem 
17.2(e)  [31],  which  was  stated  without  proof. 
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Proof.  Let  B  —  .  Then 

>  0 


Zx  =  ZB^  S„) 


by  theorem  41.  Similarly,  let  £>  =  .  Then 

\  ) 

Z2  =  ZD~  CiV„,,._.,|(^0,S,0"ED)  =  CA'„,|..„,(^2,H,E22) 


Theorem  43  Let  Z  ~  CiV,„,,(/i,E,  E).  Partition  the  random  variable  and 


parameters  as  follows.  Let  Z  =  (ZijZa),  p  =  {pi,p2)i  ^  = 


andT 


=  (r,  r,)- 


Sii  Si2 


S21  S22 


Let  E  ^  0.  Then  Zi  and  Z2  are  independent  if  and  only 


if  Tii2  =  0.  This  is  a  complexification  of  Arnold’s  theorem  17.2(f)  [31],  which 
was  stated  without  proof. 


Proof.  Working  with  the  characteristic  function,  equation  D.9,  we  see  that 


^z{T)  =  exp  i  Re  tr 


f  /  \  / 

I  T[f  /  \  Sn  Ei2 

=(  Tx  r2  I 

Tf^  E21  E22 


=  exp  i  Re  tr 


T»P2 

T2«/Zi  Tffp2 
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1 

-4*' 


‘  \  ( 

I  ^1^11  +  T2S21  TiSi2  +  72S22 

^  ■'2  “  / 

=  exp  [t  Re  (tr[7\"/ii]  -f-  tT[T^ H2[) 


)} 


-i  tr[r"EriS„  +  Tf  Er2S2i]  -  ^  trlT^'^ETiSij  +  Tf  ET2S22]] 
=  exp  i  Re  (tr[r"/ii])  -  ^  tr[T"EriSii]j  x 
X  exp  Re  (tr[T2^//2])  —  ^  tr[T2^ET2S22]j  x 
-i  tr[rf  ET2S21]  -  \  tr[r2"ErxS,2]] 


X  exp 


=  $2j(ri)$^2(T2)  if  and  only  if  S12  =  0 


(D.18) 


By  the  Neyman-Fisher  factorization  theorem,  Zj  and  Z2  are  independent  if 
and  only  if  S12  =  0. 


Theorem  44  Let  Z  ~  CNmrif^,^,^).  Partition  the  random  variable  and 

E  ^ 

Lin  ^12 

parameters  as  follows.  Let  Z  =  (Zi,Z2),  p  =  {p\,p2)i  S  = 


E21  E22 


Let 


E22  nonsingular  and  define  Eii,2def  En  —  Ei2E22^E2i.  Then  the  conditional 
distribution  of  Zj  given  Z2  is  given  by 

(Zi  I  Z2)  ~  CNm.rt  {{Pl  +  (^2  —  M2)Sj2^E2l),:i,  (Ell  “  Si2E22^E2l)) 


=  CNm,ri  ((Mi  +  {Z2  —  ^2)^22*5121),— ,  E11.2) 


This  is  a  complexification  of  Arnold’s  theorem  17.2(g)  [31],  which  was  stated 
without  proof. 
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Proof.  Let  B  = 


y  ~^22^  ^21  I  j 


and  consider  the  transformation 


Y  ^{YuY2)  =  ZB  =  {ZuZ^)B 


Then 


(Fi,F2)  =  (^1-^2S27S21,Z2) 


Thus  fiB  =  {ni  —  /X2S22^S21j/^2)-  The  covariance  is  found  by 


/ 


7  —yHy-H 


0 


\ 

/  N 

/ 

Ell  Ei2 

/ 

^  E21  E22  ! 

~^22*^21  I  J 


Ell  —  E^E22^S21  —  Ei2S22'S21  +  E^  E22^S22S22*  S21  E12  — 

E22 

/  \ 


E21  —  E22S22*S21 


Ell  —  E12E09  E21  0 


■'22 


(D.19) 


) 


Jll  ~  ‘^12‘^22  ‘“21 

0  E2 

where  E  =  E^.  By  theorem  41,  (Fi,  V2)  ~  CNm,r{l^B,3,B^T,B).  By  theorem 
43,  Fi  and  F2  are  independent.  Since  Yi  and  F2  are  independent,  then  the 
density  factors  as 


..1.  fiyuY2)  /(Fi)/(F2) 

J[yi  \  ^2)  =  — r/xTT"  -  - 77T7~\ —  -  jyy) 


fm 


f(Y2) 


(D.20) 


Thus 


(F,  I  F2)  ~  CiV,„,r,((/l,  -/l2S22‘E2l),H,(E„  -  E,2S2-2’E2i))  (D.21) 
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Recall  that  Y2  =  Z2  and  Yi  =  Zi  -  Z2S22  ^21  which  implies 

Zi  =  Yl  -t-  Z2^22  ^21 

In  the  conditional  density  f{Yi  |  Y2  =  Z2),  Z2  is  a  constant.  Apply  theorem 
41  to  find  f{Zi  I  Z2).  Here,  1/  =  Z2T,22'^2i-  The  row  and  column  covariance 
matrices  remain  the  same,  and  the  mean  becomes 

Pi  —  ^2^22^ ^21  +  ■^2222^^21  =  Pi  +  {^2  ~~  (^2)^22  ^21 


Thus 


(Zl  I  Z2)  ~  CA^m,rj((Pl  +  (^2  —  ^2)^22^1121  )>■=->( ^11  ~  Si2S22^S2i)) 

=  CiVm,ri((Pl  +  {^2  ~  ^2)^22^221),  — ,  S1I.2)  (D.22) 

Corollary  10  Let  Z  ~  CiV,„,r(p,H,S).  Then  Z*  ~  CiV,_,„(p»,E*,  S*).  This 
is  a  variation  on  the  complexification  of  Arnold’s  theorem  17.2(c)  [31]. 


Proof.  Let  X  ~  CN{0,I,I)  and  Z  =  AXB  +  p  where  S  =  and 

H  =  AA^.  Then  Z*  =  A*X*B*  +  p*-  Using  the  characteristic  function,  we  see 


^z-{T) 


iRe(tr[r"(A*A:*R*  +  p*)])]} 


=  €  {exp  \i  Re  (tr[r"A*A:*R*  +  T^P*])] } 

=  exp  [f  Re  {tr[T»p-])]e  (exp  [i  Re  {iT[T" A* X^ B^])] } 
=  exp  [i  Re  (<r[TV])]^  (exp  \i  Re  (tr[(R*r"A*)A:*])] } 


=  exp  Re  |exp  Re  ^tr[(i4^T5^)^X*]jJ  | 

=  exp  [i  Re  (fr[T^p*]^j  ^z*{^TB^) 

By  proposition  38,  ^z[T)  =  ^z'{T).  Thus 

$^.(r)  =  exp  [i  Re  (tr[T"p'])]  ^z{A^TB'^) 

By  equation  D.5  we  find  that 

^z-(T)  =  exp  [iRe  {tr[T^ exp  f-^  iT[{A^TB'^f  {A^TB'^)^ 

=  exp  [i  Re  (tr[T^fi’])]  exp  tT[B^T^ A^TB^]j 

=  exp  [i  Re  [tr[T^fi*])]  exp  tT[T^ A^A'^TB'^ B’]j 

=  exp  \i  Re  (tr[T"p*])]  exp  tr[r"ETS*]] 

This  is  the  characteristic  function  of  a  variable  distributed  as 


Theorem  45  Let  Z  ~  where  the  matrix  dimensions  are  Zmxr, 


HmxTi  ^mxmi  ^txt-  Partition  the  random  variable  and  parameters  as  fol- 
(  7  \  (  \  (  -  -  \ 

/jl  p\  _  —11  —12 

lows.  Let  Z  =  )  ^  =  1  where  Z\  and  pi 

^  ^2  y  y  P2  y  y  “21  “22  ! 

are  mi  xr  and  E\i  is  mi  x  mi.  Then  Zi  ~  CA^m,,r(^i,Hii,  S)  and 


Z2  ~  CN(m-m,  ),r(P2i  “22i  S) 


This  is  a  variation  on  a  complexification  of  Arnold's  theorem  17.2(e). 


498 


Proof.  Let  A  =  0).  Then  by  theorem  41, 

Zi  =  ~  =  CiV,„,..(/i,,Ei„S)  (D.23) 


Likewise,  Let  B  =  (0, /m-mi)-  Then  again  by  theorem  41, 


Z2  =  BZ^  =  C7V(„_,„,).,(/i2,H22,S)  (D.24) 


Theorem  46  Let  Z  ~  CiV„i,r(/i,E,  E).  Partition  the  random  variable  and 


(  rr  ) 

(  \ 

Zi 

/^1 

“11  “12 

parameters  as  follows.  Let  Z  = 

,  = 

<  } 

j 

^  “21  “22  ! 

and  T  = 


\  ^2  / 


.  Let  E^  0.  Then  Zj  and  Z2  are  independent  if  and  only  if 


Ei2  =  0.  This  is  a  variation  of  a  complexification  of  Arnold's  theorem  17.2(f) 


[31]. 


Proof.  The  characteristic  function  of  Z  is  given  by 


( 

(  \ 

^z{T)  =  exp 

i  Re 

tr  < 

1^1 

V 

r 

(  \ 

“11  “12 

Tl 

(r^r") 

t 

^  “21  “22  j 

=  exp  i  Re  (tr[Tl^ pi+T^  P2]) 


tr  {(T"E„r,  +  r"E2,r,  +  T[^EuT2  +  T»^22T2)  s}] 
=  ^z,(T,)<I>z,(r2)exp  [-^  tr((T"E2,r,  +  T[^EnT2)  S]] 


499 


=  ^Zi{Ti)^Zi{T2)  if  and  only  if  =  0  (D.25) 

Thus  by  the  Neyman-Fisher  factorization  theorem,  Zi  and  Z2  are  independent 
if  and  only  if  E\2  =  0. 

Theorem  47  Let  Z  ~  S).  Partition  the  random  variable  and 


parameters  as  follows.  Let  Z  = 


(  r,  \ 

(  \ 

Zi 

Ml 

■=•11  •=•12 

,  p  = 

» ■=■— 

K  ) 

,M2y 

^  ■=■21  -=22  y 

Let 


E22  he  nonsingular  and  define  En  ^defEn  —  Ei2E22'H2i.  Then  the  conditional 
distribution  of  Zi  given  Z2  is  given  by 

I  Z2)  ~  CNmi,r  ((pi  +  ■=•21  “22* (■^2  —  /22))»('=n  —  •=12'=22*— 21 ),  S) 

=  CNmi,r  ((Mi  +  ^21'=-22*(^2  —  ^^2)))  “11.2»  Sj 
This  is  a  variation  on  a  complexification  of  Arnold’s  theorem  17.2(g)  [31]. 


Proof.  Let  A  = 


1  — wi2“22 


0 


Y  = 


and  consider  the  transformation 


/ 


\Y., 


=  AZ  =  A 


\  ) 


Then 


/  \ 
V'l 

/ 

\ 

Z\  —  C,12.=22*  ^2  ^ 


(D.26) 


Thus  the  mean  of  the  transformed  random  variable  is 

I  -  --1  ^ 

I  P\  —  •=-12'=-22  /22 

Ap  = 


v 


/i2 


/ 
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(^1  I  y^)  ~  ~  ^12^22  f^2),  (“11  —  “12“22^“2l)»  S)  (D.29) 

Recall  that  12  =  Z2  and  Yi  =  Zi—3i2^22  ^2  which  implies  Zi  =  Yi+Et2^22  ^2- 
In  the  conditional  density  f{Yi  |  Y2  =  Z2),  Z2  is  a  constant.  Apply  theorem 
41  to  find  f{Zi  I  Z2).  Here  1/  =  3x2^22  ^2-  The  distribution  mean  is 

/ll  —  C.i2=‘22l^2  +  “12“22*  ^2  =  /^1  +  — i2Hj2*(^2  —  /^2)  (D.30) 
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Therefore  the  conditional  distribution  of  Zi  given  Z2  is 

(Zi  I  Z2)  ~  +  ^12^22(^2  ~  fJ'2)),{^n  —  “12“22^“2i),  S)  (D.31) 

Theorem  48  Let  X  ~  CiVp^m(/ii,Ei,Si)  and  Y  ~  CA^p,„,(/f2,H2,S2)  be  in¬ 
dependent  matrix  complex  normal  random  variables  of  the  same  size  matrices. 
Then  the  distribution  of  the  sum  X  -i-Y  is  given  by 

X  -\-Y  CNp^rn  {Pl  +  P2',  “1  +  — 2i  +  S2) 

This  common  theorem  was  supplied  by  me. 


Proof.  Since  X  and  Y  are  independent,  their  joint  distribution  is  given  by 


41,  the  distribution  of  X  +  1^  is 

X  -\-Y  CNp^m  (pi  +  P2i  “1  +  “2?  Si  +  E2) 


This  is  a  p  X  m  matrix.  Also, 
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and 


□ 


B  —  Si  +  S2 


D.2.3  Specialization  to  the  Vector  Complex  Normal 
Distribution 

We  specialize  a  very  few  results  to  the  vector  complex  normal  distribution  since 

this  is  the  form  most  engineers  finish  there  statistical  preparation  with.  Let  z  = 

/  \ 

2l 

:  be  an  n-dimensional  random  vector  such  that  the  Zj  are  independent 

^Zn  ) 

and  each  element  is  distributed  according  to  the  standard  univariate  complex 
normal  distribution  C'iVi(0, 1).  The  characteristic  function  of  the  individual 
elements  is  =  exp  .  For  2  =  (2j)„  independent  and  identically 

distributed,  then  the  characteristic  function  of  the  vector  random  variable  is 
given  by 

=  Rexp  =exp[-i<"t  (D.32) 

The  density  function  for  the  vector  random  variable  of  independent  and  iden¬ 
tically  distributed  univariate  complex  Gaussian  elements  is  given  by 

/(^)  =  n  ^  exp  ^  exp  \-z»z]  =  ^  etr  [-z»z]  (D.33) 

We  denote  the  standardized  vector  complex  normal  distribution  by  CA^„(0,  /„). 
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Zl 

Corollary  11  Let  z  =  :  ~  CiV„(0,  /„),  A  6  //  e  C"*,  E  =  AA", 

[^n  J 

and  y  Az  +  fi.  Then  y  ~ 

Proof.  By  the  transformation  y  =  Az  +  fi,  we  see  that  y  €  C"*.  By  theorem 
18  from  properties  of  a  characteristic  function,  we  see  that 

^y{t)  =  ^Az+nit)  =  exp  {t  Re  [t"/i] }  ^z{A”t)  (D.34) 

=  exp  |i  Re  I  exp  — =  exp  |i  Re  AA^t^ 

=  exp  Re 

Thus  y  ~  The  characteristic  function  presented  here  differs  from 

that  given  by  problem  2.66  of  Anderson  [26]. 

Corollary  12  Let 

^  ~CJV,(0,/„) 

B  G  C^’^P, 


E  =  B^B  and  y  =  zB  +  i/.  Then  y  ~  CNp{i/,  S). 

Proof.  For  independent  and  identically  distributed  («j)n,  we  get  the  char¬ 
acteristic  function 

^zit)  =  f[exp  -jt*tj  =nexp  -jtjt* 

i=i  >■  't  j  i_j  i  j 


(D.35) 
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where  t 


=  J[exp  “7^^^  =  II®^P 

t=i  L  4  J  L  4  J 

=  ^  f  j  ...  ^ .  The  density  function  is  given  by 

/(^) = n  ^  [-^j^j] = n  ^  ^^p  [~^i^j] 

i=l  ’*  t=l  ^ 

=  ^  '**P  [-^^"1  =  ^  “P 1- 

By  the  transformation  y  =  zB  +  />,  we  see  that  y  G  C^.  From  theorem  18,  the 
characteristic  is  given  by 

^y{t)  =  $2b+,/(0  =  exp  |i  Re  [tr(<"r/)] }  (D.37) 

where  we  retain  the  notation  t  but  modify  it  so  that  it  is  now  t  =  ^  ...  ^ 

Then 

^y{t)  =  exp  Re  [tr(t^i')J  |  exp  —  ^  tr(Bt^tB^)j  (D.38) 

=  exp  Re  [tr(t^i/)j  |  exp  ^  tr(<B^5t^)j 
=  exp  |i  Re  [tr(<"i/)]  -  ^  tr(tEt")|  =  exp  Re  [w"]  -  (D.39) 

Therefore  y  ~  CNp{i/,  S). 


where  Zk  and  p*  are  row  vec¬ 


tors,  and  the  {Zk)n  are  independently  distributed  according  to  the  p-variate 


/  ^ 

( 

Ml 

Theorem  49  Let  Z  = 

,  = 

; 

^  y 

vector  complex  normal  distribution  C7Vp(^fc,E).  Then  Z  ~  CA^„,p(p, /,  E). 
This  is  a  complexification  of  the  first  part  of  Arnold’s  theorem  17.3  [31], 
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Proof.  This  is  a  complexification  of  Arnold’s  proof,  where  I  have  also  used 
characteristic  functions  rather  than  moment  generating  functions.  By  equation 
D.39, 

«z.(n)  =  exp  {i  Re  [tr(T''pO]  “  J  tr(rtErf)|  (D.40) 

Thus  ^z{T)  =  n  by  independence.  Then 


fc=i 


=  n  exp  I?  Re  [tr(rf /^fc)]  -  j  tr(TfcErif  )| 

k=l  ''  4  J 

n  exp  {i Re  [i.,T«\  -  =  exp  |i Re  p j 


=  exp 


i  Re 


tr 


/ 


1 

~4 


Tisr" 


TnSr" 


n 


=  exp< 


' 

(  \ 

f  \ 

i  Re 

tr 

; 

(  T«  ■■■  T«  j 

^  } 

r 

(  _ 

\ 

11 

-i'*- 


Ti 


L\  / 


(jH  ... 
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=  exp  Re  [tr(/ir^)]  -  ^  tr  [TET"] 

} 

(D.41) 

=  exp  Re  |tr(T’^/i)j  ~  ^  fr  [t^/TE 

} 

(D.42) 

which  is  the  characteristic  function  of  CN„,p(/x,  7,  E). 

f  \ 

{ 

Zi 

/^i 

Theorem  50  LetZ  ~  CN„,p(/x,/,E)  and  partition  Z  = 

,  p  = 

where  Zk  and  fik  are  row  vectors.  Then  the  Zk  are  independently  distributed 
according  to  Z^  ~  C7Vp(/x^,  S).  This  is  a  complexification  of  the  second  part 
of  Arnold’s  theorem  17.3  [31]. 


Proof.  Note  that  3  =  1  and  thus  Ejj  =  Sij.  Then  by  theorem  46,  each  of 
the  Zk  are  independently  distributed  according  to  Zj  ~  CNp{pki  2)* 

D.2.4  Matrix  Complex  Normal  Density  Function 

Theorem  51  Let  Z  ~  CN„,p(//,E,  E)  where  E  and  S  are  Hermitian  positive 
definite.  Then  Z  has  the  joint  density  function 

=  x^ldetHridetSr 

This  is  a  complexification  of  Arnold’s  theorem  17.  j  [31]. 

Proof.  This  is  a  complexification  of  Arnold’s  proof.  Let  X  ~  CA^„,p(0,  /„,  /p), 
or  equivalently  let  Xjk  ~  CiVi(0, 1).  Then  by  equation  D.6, 

fW^^^eiri-XXx) 


(D.44) 
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Let  A  and  B  be  nonsingular  matrices  such  that  E  =  AA^  and  S  =  as 

in  equation  D.8.  By  theorem  119,  this  factorization  is  possible  because  E  and 
S  are  positive  definite.  Let  Z  =  AXB  +  ~  CA^„,p(//,E,  S)  by  theorem  41. 

Then  X  =  A~^{Z  —  fi)B~^.  By  theorem  34,  the  absolute  value  of  the  Jacobian 
of  this  transformation  is  given  by 

I  J|  =  |det  jdet  (D.45) 

where  the  result  is  modified  by  our  previous  regarding  the  Jacobian  of  a  com¬ 
plex  linear  transformation.  Thus 

|J1  =  IdetEpndetSp"  (D.46) 

Therefore,  Z  has  the  density 

MZ)  =  Jx(A-'{Z  -  |J1  =  Jx(A-'{Z  -  IdetHr”  |det  S|- 

=  .p.|detsridetBr 
=  .^idetsridetsr 

=  .>-|de.;naetSr  -  .)«)  (D.47) 
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D.2.5  Specialization  to  the  Vector  Complex  Normal 
Density  Function 

The  special  case  of  the  vector  complex  normal  distribution  is  widely  used  in 
applications,  and  thus  deserves  explicit  attention.  The  following  are  corollaries 
to  the  matrix  complex  normal  density  which  was  just  derived. 

Corollary  13  Let  z  ~  CiVi_p(/t<,  1,  S)  where  S  is  Hermitian  positive  definite. 
Then  z  has  the  joint  density 

Recall  here  that  z  is  a  row  vector. 

Corollary  14  Let  Z  ~  CA^„, p(/i, /,  S)  where  S  is  Hermitian  positive  definite. 
Then  Z  has  the  joint  density 

Recall  here  that  Z  is  a  matrix  whose  rows  are  independent. 

Corollary  15  Let  z  ~  CA^„,i(/x,E,  1)  where  E  is  Hermitian  positive  definite. 
Then  z  has  the  joint  density 


Here,  z  is  a  column  vector. 


509 


Corollary  16  Let  Z  ~  C7V„_p(/x,E, /)  where  E  is  Hermitian  positive  definite. 
Then  Z  has  the  joint  density 

Here,  Z  is  a  matrix  whose  columns  are  independent.  This  is  the  form  usually 
seen  in  the  literature. 


D.3  Complex  Wishart  Distribution 

The  object  of  this  section  is  to  develop  the  definition  and  properties  of  the 
complex  Wishart  distribution.  The  development  that  follows  is  primarily  a 
complexification  of  Arnold’s  section  17.3  [31]. 


D.3.1  Introduction 


Definition  6  Let  matrix  Z  = 


be  distributed  according  to  the  ma¬ 


trix  complex  normal  distribution  CA^„,p(/i, /„,  E).  The  row  vectors  are 

independent,  and.  Zj  ~  CNp{p[,  S).  Let 


w  =  z^z  =  J2 

1=1 

Then  W  is  defined  to  have  a  complex  Wishart  distribution.  W  is  a  p  x  p 
complex  matrix  that  is  Hermitian  nonnegative  definite.  We  identify  this  dis- 
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tribution  by  notation  CWp{n,Y^,fi^ fi)  or  CW^p(n,S,^)  where  8  is  called  the 
noncentrality  parameter. 

Lemma  14  Let  W  have  a  compleT  Wishart  distribution  derived  from  Z  ~ 
CAr„  p(/x, S).  Then  the  dependence  of  the  distribution  of  W  on  the  matrix 
mean  parameter  p  is  only  through  the  noncentrality  parameter  6  =p^p.  This 
is  a  complexification  of  Arnold’s  lemma  17.5  [31]. 

Proof.  This  is  a  complexification  of  Arnold’s  proof.  Use  an  invariance 
argument  to  show  this  result.  Let  F  be  an  n  x  n  unitary  matrix.  Then 

Y  =  TZ^  r/r",  E)  =  c7v„,p(rM,  /„,  s)  (D.49) 

by  theorem  41.  Further, 

Y^Y  =  {rzfrz  =  z^r^rz  =  z^z  =  w  (d.so) 

The  distribution  of  W  is  the  same  for  any  unitary  transformation  F  of  Z.  If 
F(W',  p)  is  the  distribution  function  of  W  for  a  particular  /z,  then  F{W\  p)  = 
F{W]  Tp).  Hence,  F  is  invariant  under  the  group  G  of  unitary  transformations 
g{p)  =  F/x.  Note  the  following. 

6{Tp)  def  {TpfTp  =  p^T^Tp  =  p^ p  def  8{p)  (D.51) 

Thus  8  is  invariant  under  G.  Also, 


l<(^)  =  «(wi  =*• 


(D.52) 
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By  theorem  123,  there  exists  a  unitary  transformation  F  such  that  ^  =  Tu. 
Thus  6  is  a  maximal  invariant  by  Arnold’s  definition  (p.  13)  [31],  where  T  =  8. 

So,  we  have  a  group  G  of  invertible  unitary  functions  g{ii)  =  F/i  that  map 
space  C  =  {fi}  onto  itself,  a  maximal  invariant  S{fi)  under  G,  and  a  function 
satisfying  Fi{g{ii))  =  F{W;Tg.)  —  F{W;/i)  =  Fi{fi).  Thus  by  Arnold  lemma 
1.11  [31],  there  exists  a  function  k{6)  such  that  Fi{g.)  =  k{6{fi))  =  k(fi^fi). 
Thus  the  distribution  of  W  depends  on  /x  only  through  as  claimed.  □ 

Similar  to  Arnold’s  observation  for  the  real  variables  case,  we  note  that 
the  distribution  of  W  defined  by  equations  D.49  and  D.50  is  a  p-dimensional 
complex  Wishart  distribution  with  n  degrees  of  freedom,  on  the  covariance 
matrix  S,  and  with  noncentrality  matrix 

8  =  (D.53) 

This  distribution  is  symbolized  by  VF  ~  CWp{n,  S,  6).  Note  that  both  S  and  8 
are  Hermitian  nonnegative  definite,  which  is  symbolized  by  S  >  0  and  6  >  0. 
If  ^  =  0,  then  W  has  a  central  complex  Wishart  distribution  which  we  denote 
by  VF  ~  CVFp(n,  S).  If  ^  ^  0,  then  VF  has  a  noncentral  complex  Wishart 
distribution. 

Notation  for  this  and  other  distributions  are  not  standardized.  I  have 
adopted  Arnold’s  notation.  For  the  real  Wishart  distribution,  1  have  also  seen 
variations  of  VF(p,  n;  S,  ^).  For  this  reason,  it  is  always  best  to  define  your 
notation  at  least  once  in  your  work. 
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D.3.2  Properties  of  the  Complex  Wishart  Distribution 

Theorem  52  Let  W  ~  CVFp(n,S, 6).  Then  S{W}  =  nS  +  6.  This  is  a  com- 
plexification  of  Arnold’s  theorem  17.6(a)  [31]. 

Proof.  This  is  a  complexification  of  Arnold’s  proof.  Let  W  =  Yl  Zi, 

t=i 

where  the  Z,  ~  CA^p(/ij,E)  are  independent  row  vectors.  Then 

E  =  £{{Zi  -  pifiZi  -  /X,)}  =  SiZl^Zi  -  p^Zi  -  Zl^fii  +  /x"/x,} 

=  £{Z"Z,}  -  pf£{Zi}  -  €{Z»}fii  +  fiffi.  =  £{Z»Zi}  -  /xf/x. 

By  rearranging  the  equation,  we  get  £{Zl^ Zi}  =  E  +  /xf^/Xj.  Therefore, 

e{w]  =  zf'zt}  =  t,£{zrz;)  =  + 

t  =  l  t=l  »  =  1 

n 

=  nE  +  ^  fii  =  nE  +  p  =  nH,  +  6 
1=1 

Therefor,  S{W}  =  nE  +  □ 

This  proof  is  important  because  it  was  the  independent  information  source 
that  provided  a  clue  that  the  function  presented  by  Goodman  [92]  and  An¬ 
derson  [26]  as  the  characteristic  function  of  the  complex  Wishart  distribution 
was  the  characteristic  function  of  something  slightly  different.  This  became 
apparent  when  I  tried  to  compute  the  first  moment  by  differentiation  with¬ 
out  getting  the  above  result.  Goodman  gives  a  correct  statement  of  what  set 
of  variables  he  supplied  the  characteristic  function  of,  but  the  importance  of 
what  he  said  was  not  obvious  to  me  until  I  tried  to  use  the  function  for  some 


computational  purpose. 


Lemma  15  Let 


V'^  =  {XuYuX2,Y^,---,  Xn,  r„)  =  (K,  K2,  ^3,  K4,  •  •  •  ,  K2„_i  ,  V^2n) 

be  distributed  according  to  the  real  vector  normal  distribution  A^2n(*'i 
where 

=  (i/l,---,I/2n)  = 

and  >  0  is  a  real  scalar.  Then 


2  t/Ti/  2 


The  variable  names  of  Xk,  Yk,  tiRk,  fiik  ore  defined  here  to  suggest  notation 
used  in  the  proof  of  theorem  53.  This  is  a  slight  variation  of  Arnold’s  lemma 
3.8  [31]. 


Proof.  This  is  a  slight  modification  of  Arnold’s  proof,  where  I  accounted 
for  the  variation  of  the  theorem  and  also  used  characteristic  functions  rather 
than  moment  generating  functions.  Because  V  ~  A^2n(*'i  14  are 

independent  and  14  ~  Xi{vky\(T'^).  From  this,  we  compute  the  characteristic 
function  of  the  joint  distribution  as  follows. 


2n  2n  1  /I  M 

$v(i) = n  *>'.(4) = n  exp  ^Ik^k  -  -t\  (^2^^/J 

=  exp  (tlJ/l  +  •  •  •  +  t2n*^2n)  -  {t]  +  '  '  '  +  }  =  ^Xp  (it'^U  - 

Recall  that  V'^V  =  E  V^,  =  E  vl,  and  ^  ~  M  l)  .  Therefore 
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by  the  definition  of  the  noncentral  distribution  given  in  Arnold  section  1.4 
[31].  □ 


Notation.  It  is  common  for  the  distributional  notation  to  be  slightly  abused 
as  follows  to  simplify  discussion.  The  abusive  notation  u  ~  aXn(^)  intended 
to  mean  ^  ~  Xn(^)-  expand  the  shorthand  notation  for  the  distribution 

into  its  density  function  for  both  cases,  it  becomes  obvious  that  the  paired 
relationship  is  not  strictly  true.  That  would  be  a  statement  about  how  a 
change  of  variables  is  implemented.  However,  the  abusive  notation  does  allow 
simplification  of  other  developments  which  involve  ratios  such  that  the  abusive 
constants  divide  out,  and  no  one  is  the  wiser.  Statisticians  have  also  used  this 
convention  with  other  distributions  in  journal  articles.  Caveat  emptor. 


Theorem  53  (Important^  Let  W  ~  CWp=i(n,S  =  cr^  >  0,S).  Then  -^W  ~ 
X2„  •  This  is  a  complexification  of  Arnold  theorem  17.6(b)  [31],  which 

was  stated  without  proof. 

Proof.  Let  Z  ~  CNn{p,cr^I)-  This  implies  Zk  ~  Let  Zk  = 

Xk  +  iYk.  Recall  that 

W  =  z‘'z  =  -Z  +  in)  =  +  y?) 

*=1  fc=l  fc=l 

Examining  Zk,  we  see  that  Zk  ~  CNi(fik,<^^)  implies  that  the  real  and  imagi¬ 
nary  parts  of  Zk  are  distributed  as  X/t  ~  Ni{pnk,  |cr^)  and  Yk  ~  N\{pik,  \cr^)- 
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Recall  that  CiVi(/x,cr^)  is  isomorphic  to  N2 
a  =  cth  +  i(Ti.  Then 


/ 

/  \ 

\ 

1 

0-2  0 

’  2 

^  0 

/ 

.  Let 


=  {(Tr  +  i<Ti){(TR  -  iat)  =  aji  +  (Tj 
Written  in  matrix  analogy, 


(  \ 

(  \ 

/  \ 

1 

Or  —aj 

<TR 

1 

-f-  O’/  0 

2 

“  2 

^  <^R  j 

^  -<T/  (Tr  ^ 

^  0  O’fl  +  <^7  y 

is  isomorphic  to  the  above.  Using  the  notation  from  lemma  15,  we  note  that 
W  =  V'^V  =  E  (XH  +  Y^)  and  1/^1/  =  Thus  when  W  ~  CWi(n,(r2,6), 

k=l 

we  have  ~  xL  =  xL 

Discussion.  We  will  see  this  theorem  used  later  in  developing  the  form  of 
the  density  function  for  the  complex  Wishart  distribution  by  using  a  proof 
by  the  principle  of  finite  induction.  Tague  [264]  used  this  theorem  in  his 
development  of  the  signal-to-noise  ratio  at  the  output  of  a  beamformer. 


Lemma  16  Let  W  ~  CWp{n,  E,^).  Let  a  >0  be  a  real  scalar,  and  let  a  =  b*b. 
Then 


aW~CWp(n,aE,a6) 


This  is  a  complexification  of  Arnold’s  theorem  17.6(c)  [31],  which  xoas  stated 


without  proof. 
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Proof.  Let  W  =  where  Z  ~  CiV„,p(//,/,S).  Then 

aW  =  aZ^Z  =  {bZ)^(bZ) 


By  lemma  13  we  know 

bZ  ~  CiV„.p(6/i,  /,  |6p  E)  =  CiV„,p(6/^,  /,  aE) 


We  also  know  that 


a6  =  {bfi)^{bfi)  =  afi^  fi 

Therefore  aW  ~  CWp(n,E,^).  □ 


Theorem  54  (Important^  VT  ~  CWp(n,  E,6)  and  A  €  Then 

AW  A"  ~  CWfc(n,  >IE>1",  ASA") 


This  is  a  complexification  of  Arnold’s  theorem  17.6(d)  [31], 


Proof.  This  is  a  complexification  of  Arnold’s  proof.  Let  W  =  Z"Z  where 
Z~CAr„,p(;z,7,S).  Then 

ZA"  ~  CNr,,k{pA",I,AEA") 


by  theorem  41.  Thus  W  ~  CWp(n,  E,6)  and 

{ZA")"{ZA")  =  AZ"ZA"  =  AW  A"  ~  CWk(n,AEA",ASA") 


where 


{pA")"{pA")  =  Ap"pA"  =  ASA" 


□ 
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Theorem  55  Let  W  ~  CVrp(n,E,6),  A  =  (/,  0),  and  B  =  (0,7).  Define 


the  following  partitions:  W  = 
\ 


(  \ 

(  ^  \ 

VPll  lPl2 

Ell  5^12 

,s  = 

^  VV21  VP22  j 

^  E21  E22  y 

,  and  8  = 


8ii  Si2 


where  Wn,  En,  and  are  k  x  k  matrices.  Then  AW = 

I 

^  ^21  822  ^ 

Wn  ~  CW^fc(n,Sii,6ii)  and  BWB"  =  W22  ~  Cl^p_fc(n,S22,622).  This  is  a 


complexification  of  Arnold’s  theorem  17.6(e)  [31],  which  was  stated  without 
proof. 

Proof.  The  results  for  both  AW A^  and  BWB^  follow  directly  from  the¬ 
orem  54. 


Theorem  56  Let  W  ~  CWp(n,S,6).  Partition  VP,  S,  and  8  into  identical 
blocks  of  pi,  p2,  1  Pq  rows  and  columns  where  pi  +  •  ■  ■  +  Pq  —  p. 


(  \ 

(  \ 

(  \ 

VPii  •••  IPi, 

^11  ••• 

^11  •  •  •  61, 

W  = 

:  ■  ■ .  : 

E  = 

:  '  • .  : 

8  = 

:  • .  : 

•••  W<,<,  , 

^  E,i  •  •  •  E„  ^ 

^  ^,1  •  •  •  8qq  ^ 

Let  Sij  =  0  fori  ^  j.  Then  the  {VPij}  are  independent  and  VP„  ~  CVPp, (n,  E„,  ^,i). 
This  is  a  complexification  and  generalization  of  Anderson’s  theorem  7.3.5  [26] 
to  the  noncentral  complex  Wishart  case. 

Proof.  This  is  a  complexification  and  generalization  of  Anderson’s  proof. 

n 

W  has  the  same  distribution  as  J]  where  the  Z*  are  independent  and 

k=\ 
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distributed  as  Zk  ~  CNp{n,  E).  Partition  Z*  and  /x  into  blocks  of  pi,  p2,  •  •  • ,  Pq 
rows.  We  know  by  theorem  43  or  theorem  46  that  the  are  independent 

because  E,j  =  0.  Because  the  Zk  are  independent  also,  we  know 


7(1)  . .  7(9)  7(2)  . . .  7(9)  . . .  7(1)  . . .  7(9) 

^  1^2  1  1^2  1  i^n  1  i^n 


are  independent.  Thus 

vr,,  =  2  •  ■  ■ .  ‘v’.,  =  E 

fc=i  fc=i 

are  independent.  Let  A,  6  where  A  =  (0,/p,,0).  By  theorem  54, 


Wii  =  AWA^  ~  CWp.(n,  Si„<5..) 


□ 


Corollary  17  As  in  theorem  56,  let  W  ~  CVPp(n,S,6)  and  partition  W,  S, 

and  6  into  identical  blocks  ofpi,  P2,  •  •  * ,  P?  rows  and  columns  where  pi  H - 1- 

Pq  =  p  and  let  E,j  =  0  for  i  ^  j.  When  6  =  0,  then  Wa  ~  CVPp,(n,  S„).  This 
is  a  complexification  of  theorem  7.3.5  of  Anderson  [26]. 

Proof.  Substitute  6„'  =  0  into  theorem  56. 

Theorem  57  Let  W  ~  CWp(n,E,6)  and  let  c  be  any  nonzero  p  x  1  vector. 
Then 

2c"Wc  2  f2c^^c\ 
c«Sc 


This  is  a  complexification  of  Graybill  theorem  9. 3.2(4)- 


519 


Proof.  This  proof  differs  from  Graybill’s  in  order  to  take  advantage  of  other 
work  already  presented  here.  By  theorem  54, 

c”Wc  ~  CWi{n,  c"Sc,  8c) 

Applying  theorem  53,  where  c^Ec  >  0  to  make  sure  the  denominator  does  not 
go  to  zero,  we  obtain  our  final  result  that 

2(^Wc  2  /2c"6c\ 

□ 

Let  W  =  UL^U^  be  the  eigenvalue  decomposition  of  VP  ~  CVPp(n,E,6). 
Consider  a  linear  combination  of  the  sample  eigenvalues,  given  by  L^c,  where 
c  is  a  p  X  1  vector  of  known  fixed  constants.  Then  by  theorem  54  we  have 

c^L^c  ~  QWi{n,(^U^mc,c”U"8Uc) 

Note  that  c^L^c  is  a  scalar,  as  are  now  all  the  parameters  of  the  distribution. 
Then  by  theorem  53,  when  c^U^TiUc  ^  0, 

2(^1^ c  2  (2c»V^8Uc\ 

cHU«mc  ~  Vc"(/"Sf/c  ) 

We  know  that  when  U  is  unitary,  then  the  similarity  transformation  U^WU 
has  the  same  eigenvalues  as  W.  Then,  when  we  let  c  be  a  column  vector  of 
ones,  we  get 

2tr(VP)  2  (2c»U^SUc\ 

cHU»Wc  \c«t/"Sf7c  j 
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Remark.  Because  U^HXJ  is  not  generally  a  diagonal  matrix,  we  conclude 
that  the  sample  eigenvalues  are  generally  not  independent.  Thus,  disjoint 
linear  combinations  of  sample  eigenvalues  from  the  same  sample  set  are  not 
independent.  Therefore,  the  ratio  of  disjoint  linear  combinations  of  sample 
eigenvalues  from  the  same  sample  set  is  generally  not  F-distributed. 

Now,  consider  the  case  when  C  —  (C'i,C2)  is  a  p  x  2  matrix.  Look  at 

C^L^C. 

r<HT2(^  r^H  T2r< 

This  is  distributed  as 

~  CW2{n,C”U^WC,C"U^6UC) 


C^X'^C 


C^A^Ci  C^A^Ci 


Now,  the  matrix  is  an  ordered  set  of  4  random  variables.  We  have  taken 
linear  combinations  to  force  two  of  the  sample  values  to  zero.  Note  that  since 
W  =  we  really  have  only  3  random  variables. 
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We  will  pick  up  on  this  theme  again  when  we  discuss  the  density  functions 
of  sample  eigenvalues  in  a  later  section.  We  next  consider  a  projection  theorem. 

Theorem  58  Let  Z  ~  CA^n,p(A‘?f»S),  V  be  a  k- dimensional  subspace  o/C”, 
and  Pv  a  projection  operator  from  C"  onto  V.  Then 

Z^PvZ  ~  CWp(A;,E,/i"Pv  p) 

This  is  a  complexification  of  Arnold’s  theorem  11.1  (a)  [31]. 


Proof.  This  is  a  complexification  of  Arnold’s  proof.  Let  t/  be  a  unitary 
basis  matrix  for  V.  Then  by  theorem  41, 


Then 

{U^ZfiU^Z)  =  Z^UU^Z  =  Z”PvZ  ~  CWp(fc,S,^"Pv  p) 

where 

{U^pfiU^p)  =  p^UU^p  =  p”Pv  P 

is  the  noncentrality  parameter.  Arnold  comments  that  if  A  is  an  idempotent 
n  X  n  matrix  of  rank  A:,  then 

Z"AZ~CWp(ifc,E,/i"A/i) 


□ 
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Lemma  17  Let  Z  ~  CiV„^p(/i,  /,  E).  Then  AZ  and  BZ  are  independent  if 
AB^  =  0.  This  is  a  complexification  of  part  1  of  Arnold's  theorem  17.7(b) 
[31],  which  was  stated  without  proof.  This  result  differs  from  Arnold’s  in  that 
independence  does  not  imply  AB^  =  0. 


$y(r)  =  ^cz{T) 


=  exp  Re  [tr  [T^ Ap  +  T[^Bp)\  -  ^  tr  {{t^ A  +  T!^ b)  [a^Ti  +  B^T^  s)  | 

=  exp  |i  Re  ^tr  {T^  Ap  +  T^ 

tr  (rl^AA^Tii:  +  T”AB^T2T.  +  T^BA^TxY.  +  T^BB^TfZ:)  } 

=  exp  |z-  Re  [tr  {t[^ Ap)]  -  ^  tr  [t[^ AA^Txl)  | 

X  exp  |i  Re  [tr  {T!^Bp)\  -  ^  tr  [T^BB^Ti^)  | 
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If 


exp 


^cziT)  =  ^az{Ti)^bz{T2) 


then 


tr  {{Tl^AB^Ti  +  Ti^BA^Ti)  s)  =  0 


for  all  Ti,T2.  Consider  Ti  = 

(t  ) 

i  11 

,  T2  = 

/  N 

Ti2 

,  AB^  = 

/  \ 
1  0 

^  ^21  j 

^  ^22  y 

S  = 


^  0  1 


0/ 


Then 


/ 

( 

\\ 

0 

1 

;)  =  tr 

{t;,t,2at;,T22) 

\ 

> 

=  0 


Therefore  independence  does  not  imply  AB^  =  0. 
If  AB"  =  0,  then 


^cz{T)  =  ^az(Ti)^bz{T2) 


a 

Lemma  18  Let  Z  ~  CiV„_p(/i,  7,  E).  Let  B  be  nonnegative  definite  and  AB  = 
0.  Then  AZ  and  Z^BZ  are  independent.  This  is  a  complexification  of  part  2 
of  Arnold’s  theorem  17.7(b)  [31],  which  was  stated  without  proof. 
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Proof.  Follow  the  proof  of  Arnold’s  theorem  1.13.  Let  B  =  E  and  let 
rank(FJrxn)  =  Then  rank(B)  =  r.  Suppose  AB  =  0.  Then  ABE^  =  0  = 
AE^  ^jE^ .  However,  ia.nk{EE^)  =  r.  Therefore  AE^  =  0.  By  lemma  17,  this 
implies  AZ  and  EZ  are  independent.  This,  in  turn,  implies  AZ  and 

(EZfiEZ)  =  Z^E^EZ  =  Z^BZ 

are  independent.  □ 


Theorem  59  Let  Z  ~  CiV„,p(/t, /,  S).  Let  A  and  B  be  nonnegative  definite 
and  AB  =  0.  Then  Z^ AZ  and  Z^ BZ  are  independent.  This  is  a  complexi- 
fication  of  part  3  of  Arnold’s  theorem  17.7(b)  [31],  which  was  stated  without 
proof. 


Proof.  Let  A  =  D  where  D  \s  s  x  n  and  rank(y4)  =  s.  Let  B  =  E^ E 
where  E  is  r  x  n  and  rank(5)  =  r.  When  AB  =  0,  then  DABE^  =  0 
which  implies  DD^  DE^  EE^  =  0.  However,  DD^  and  EE^  are  of  full  rank. 
Therefore,  DE^  =  0  which  implies  DZ  and  EZ  are  independent  by  lemma 
17.  Then 

(DZfiDZ)  =  Z^D^DZ  =  Z^AZ 


is  independent  of 


{EZf{EZ)  =  Z^E^EZ  =  Z^BZ 


□ 
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Theorem  60  Let  Z  ~  CA^„,p(/x,E,  E)  where  E  and  S  are  positive  definite  and 
n  >  p.  Then 

Pr{rank{Z)  =  p]  =  I 

This  is  a  complexification  of  Arnold’s  theorem  17.8(a)  [31]. 

Proof.  This  is  a  complexification  of  Arnold’s  proof.  Let  Z  =  (Zi,  •  •  • ,  Zp) 
and  let  5)t(Zi,  •  •  • ,  Zt)  be  a  subspace  of  C*  spanned  by  (Zi ,  ■  •  • ,  Z*).  Since  the 
conditional  distribution  of  Z/t+i  |  (Zi,  •  •  • ,  Z*)  is  a  nonsingular  vector  complex 
normal  distribution  by  theorem  44  or  theorem  47,  then 

Pr{X,+i  €  SkiZu-  •  • , Zfc)  I  (Zi, •  •  • ,  Zfc)}  =  0 

if  lb  <  n.  Note  that  Sk  is  a  subspace  of  dimension  at  most  k.  Therefore 

Pr{Zfc+i  €  5it(Zi, •  •  • ,  Z*:)} 

=  f  {Pr[Zfc+i  €  5fc(Z„  • .  • ,  Zfc)  I  (Zi,  •  •  ■ ,  Z*)])  =  0 

Finally, 

p-i 

Pr(Zi,  •  •  • ,  Zp  are  linearly  dependent)  <  ^  Pr[Z|t+i  6  5jt(Zi,  •  •  • ,  Zk)\  =  0 

k=\ 

Therefore,  Pr{rank(Z)  =  p}  =  1-0 

Theorem  61  Let  Z  ~  CA^„,p(p,E,  E)  where  E  and  E  are  positive  definite 
and  n  >  p.  Let  a  6  C"  such  that  a  ^  0.  Let  (a,  Z)  be  the  matrix  Z  augmented 
by  the  vector  a.  Then  Pr{(a,Z)  =  p  +  \  }  =  1.  This  is  a  complexification  of 
Arnold’s  theorem  17.8(b)  [31],  which  was  stated  without  proof. 
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Proof.  I  have  not  proven  this  theorem.  Since  Arnold  claims  the  theorem 
true  in  P"  and  the  proof  depends  on  geometric  concepts,  then  it  is  true  in  C". 
This  theorem  is  retained  since  it  is  possibly  useful  with  updating  algorithms 
in  adaptive  signal  processing. 

Corollary  18  Let  W  ~  ClPp(n,  S,^).  If  n  >  p  and  S  is  positive  definite, 
then  Pr{VP  >  ('}  =  1-  This  is  a  complexification  of  Arnold’s  corollary  to  his 
theorem  17.8  [31]. 

Proof.  This  is  a  complexification  of  Arnold’s  proof.  The  rank  of  VP  =  Z 
is  the  same  as  the  rank  of  Z.  See  Arnold’s  lemma  A.9  and  theorem  A.3,  with 
straight  forward  extensions  to  the  complex  case.  By  theorem  60,  if  S  is  positive 
definite  and  n  >  p,  then  Pr{rank(VP)  =  p}  =  1.  Since  VP  is  p  x  p  of  full  rank 
p,  it  is  nonsingular  with  probability  1.  Hence  IT  >  0  with  probability  1. 

Since  VP  is  invertible  with  probability  1  if  S  >  0  and  >  p,  then  VP  has 
the  nonsingular  complex  Wishart  distribution.  Otherwise,  W  has  a  singular 
complex  Wishart  distribution  which  is  sometimes  called  a  complex  pseudo- 
Wishart  distribution. 

Lemma  19  Let  VP  ~  CVPp(n,  S,0)  =  CVPp(n,S)  where  S  >  0  and  n  >  p. 
Partition  VP  and  S  such  that  VPn  a.nd  En  are  both  9  x  ^  matrices.  Then 

VPu.2  ~  CVP,(n  -  p  +  9,  S,j.2) 
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This  is  a  restatement  and  complexification  of  Arnold’s  lemma  17.9  [31]  and  it 
is  also  a  complex  version  of  Anderson  theorem  7.3.6  [26]. 

Proof.  The  following  is  an  expansion  and  complexification  of  Arnold’s 

{  i  Sii  Si2 


proof.  Let  W 


to 

1^21 

H^22 

and  S  = 


S21  S22 


T=W2 


U  = 


V=:Wu-  W^12VP22‘W^21  =  M^ll.2 


By  theorem  55, 


r  =  1^22  -  CWp.,in,  S22, 0)  =  CWp.,in,  S22) 


(Kx„a:„x(p-,))~ca„.p(0,/,s) 

and  let  W  =  {Y,X)^{Y,X).  Then 

(  M^i2  (  Y^Y 


/ 

Wix 

Wu  ^ 

II 

,  W21 

H^22  i 

1 

W^21  ^^22  /  \  X'^Y  X»X  ) 

By  definition  of  the  complex  Wishart  distribution,  W  ~  CVPp(n,S).  Substi¬ 
tuting  into  T,  U,  and  V,  we  get 

T  =  X^X 

U  =  (X"X)-'.Y"r 

V  =  r"(/  -  x(x"x)->x")r 
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Now,  find  the  conditional  distribution  {U,  K)  |  X.  Since  (F,  X)  ~  CiV„^p(0,  /,  S) 
and  Y  is  n  X  q,  then  by  theorem  44, 

(y  I  X)  ~  CAr„,,(XS22‘S2i,7,Sii  -  Si2Sj2'S2i)  (D.54) 

Now, 

iu\x)  =  {{x^xy'x^Y  I X) 

which  is  distributed  according  to 

Sil  —  II12II22*  ^21 } 
or 

iU\X)r.  CAr(p_,),,(E2l/S2i,(X"X)-SS„  -  Si2S2-2^S2i)  (D.55) 

by  theorem  41  where  {X^X)~^  =  {X^X)~^.  Since  the  conditional  distribu¬ 
tion  of  {U  I  X)  depends  on  X  only  through  T  =  X^ X,  we  can  write 

(7/ 1 T)  ~  cyV(p_,),,(E2tS2i,r-‘,s„  -  £12^2-2^^21) 

=  CyV(p_,),,{S22S21,T-*,Sii.2) 

Consider 

[/  -  X{X”X)-^X^][I  -  X{X^X)-^X^] 

=  1-  X{X^X)-^X^  -  X{X^X)-^X^  -I-  X{X^X)-^X^X{X"X)-^X” 


=  7  -  2XiX"X)-'^X^  +  XiX^Xy'^X”  =  7  -  X{X"X)-^X^  =  Pv 
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and  therefore  this  matrix  is  idempotent.  X  is  of  rank  p  —  q,  and  so  Py  is  of 
rank  n  —  (p  —  q). 

Consider 


(XEjj'Sai)"!/  -  X(X»Xr'^X^](XE^^^E2i) 


=  s"S22"Jv:"[/  -  xix»x)-^x^](x:£^iE2i) 

SH  y'—H  yH  yH  y—H  yH  yy—ly  _  d 

21^22  -^•^22  ^21  ^21^22  -^^22  ^21  —  U 

where  £52^  =  ^22  •  Since  we  know  the  distribution  of  (F  |  A")  then  by  theorem 
58 

{V\X)  =  {Y^PvY  I  X)  -  CH^,(n  -  p  +  9,  Sii  -  J:r2^22^2i) 

=  CfVq(n  —  p  +  q',  Sn,2)  (D.56) 

Also  note 

[(A"A)-^A"][/  -  A(A"A)-*  A"]  =  APy 
=  (A"A)-*A"  -  (A"A)-‘ A"A(A"A)-‘ A" 

=  (A"A)-*A"  -  (A"A)-' A"  =  0 

where  A  =  (A"A)-‘A"  and  Py  =  I  -  A(A"A)-‘ A".  By  lemma  18,  {AY  ! 
X)  =  {U  \  A)  and  {Y^ PyY  1  A)  =  (V  |  A)  are  independent.  This  implies 
{U  I  T)  is  independent  of  (V  j  T).  In  turn,  this  implies  that 


/(f/,Vir)  =  /(V'l7’)/(t/|  A) 


Thus 
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nv  I  (t/,T))  =  =  fiv  I T)  =  nv  I X) 

Finally, 

V^ii.2  =  V  I  (t/,  T)  ~  CW,in  -  p  +  g,  S„  -  Ei2E2-2^S2i) 

=  ClF,{n  -p  +  g,En.2) 

This  completes  the  proof.  □ 

Corollary  19  Let  W  ~  CVFp(n,  E,0)  =  ClFp(n,E)  where  E  >  0  and  n  >  p. 
Partition  IV  and  E  such  that  Wu  and  En  both  scalars.  Let 

=  Ell  ~  ^12^22^21  =  2ii.2 

Then  ^VFn.a  ~  X2(n-p+i)(®)’  ®  formalization  and  complexification  of 

Arnold’s  corollary  to  his  lemma  17.9  for  the  case  of  q  =  1  [31].  Results  of  this 
special  case  are  useful  in  test  statistics  of  quadratic  forms. 

Proof.  This  is  an  expansion  and  complexification  of  Arnold’s  proof.  Let 
9  =  1  in  lemma  19  and  let  /?  =  E22*S2i-  By  equation  D.54  and  lemma  12  we 
have  (y  I  X)  ~  CN„{X^,(t^I).  We  started  with  W  ~  Wp(n,E)  with  E  >  0 
and  n  >  p.  Thus  the  matrix  (K,  X)„xp  is  of  full  row  rank  p,  which  implies 

rank(X„x(p-i))  =  p-  1. 

Let  U  =  (X"X)-iX"y  and 


V  =  y"[/  -  X{X^X)-^X”]Y 


Then 
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(V\X)~  =  CJV,.,(fta^(A'"X)-‘) 

by  equation  D.55  and  lemma  12.  Thus 

{V 1  a:)  ~ciri(n-p  +  i,ff2) 

and 

\  X)  ~ 

by  equation  D.56  and  lemma  15. 

Let  T  =  X^X.  Then  by  lemma  19  and  lemma  12, 

V\(U,T)^CWx{n-p^\y) 

and  therefore 

^v|(c/,r)~xl|„.,+„(0) 

D.4  Distribution  of  Hotelling’s  for  Com¬ 
plex  Variables 

Hotelling’s  T*  is  a  classical  statistic  for  testing  means.  It  is  a  likelihood  ratio 
test  statistic.  This  result  is  provided  because  it  is  an  easy  result  which  natu¬ 
rally  falls  at  this  point  in  the  general  theory  oi  complex  multivariate  analysis. 
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Definition  7  Hotelling ’s  for  complex  variables.  Let  Z  and  W  be  indepen¬ 
dent,  Z  CNp{p,  E)  and  W  ~  CWp{n,  E),  where  n  >  p  and  E  >  0.  By 
corollary  18,  Pr{iy  >  0}  =  1.  Define 


F=  - — t±lzffW-^Z 
P 


Then 


= 


np 


n  —  p  +  1 

is  called  Hotelling’s  T^.  This  is  a  complexification  of  Arnold’s  definition  [31] 
in  his  equation  17.21  and  accompanying  discussion. 


Unlike  the  case  of  real  random  variables,  for  complex  variables  the  case 

of  p  =  1  does  not  yield  the  square  of  a  random  variable  having  a  non-central 
* 

<-distribution.  The  careful  reader  will  realize  that  this  is  merely  due  to  C"  ~ 
When  p  =  1,  then 


In  this  section  it  is  shown  that  F  has  an  F-distribution  even  when  p  >  1.  We 
prepare  for  our  journey  with  the  following  theorem  and  corollary. 

Theorem  62  Let  A  €  and  Z  ~  7,  E).  Then 

A^WA  ~  CWk{n,  A^Y:A,  A^6A) 


where  8  ~ 
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Proof.  Let  VP  =  and  F  =  ZA.  Then 

Y^Y  =  (ZA)"(ZA)  =  A"Z"ZA  =  A^WA 
By  theorem  41  we  know  ZA  ~  CNn,k{fi^,  I,  A^T,A).  Further, 

{(lAfifiA)  =  A^fi^ixA  =  A^SA 
and  by  lemma  14  we  know  A^W A  ~  CVl4(n,  A^SA).  □ 

Theorem  63  Let  Y  ~  CiV„(/x,S)  where  S  is  positive  definite.  Then 

2y"s-'y~xL(2/^"s-V) 

Proof.  Let  Z  =  S'l/^y  Then 

z^z  =  =  y"E-*y 

Let  A  G  and  c  G  C"*.  By  theorem  18, 

^AY+cit)  =  exp  [t  Re(t^c)] 

=  exp  |z  Re(t^c)j  exp  Re([A^t]^/i)  —  ^(A^t)^E(A^t)j 
=  exp  Re(t^c)]  exp  i  Re{t^ Ap)  —  ^t^AEA^tj 
=  exp  Re  (t^[Ap  +  c])  —  ^t^(AEA^)tj 

Therefore 


Ay  +  c  ~  CN„{Ap  +  c,  AEA") 
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which  implies 

Then 

y"s-^r  =  z^z  =  vy  ~  ciyi(n,  /i,/i"s-V) 

by  the  definition  of  the  noncentral  complex  Wishart  distribution.  Thus  by 
lemma  15, 

2y''E-'y~xL(2#<''E-V). 

This  completes  the  proof.  □ 

With  this  preparation,  we  are  ready  to  begin. 

Theorem  64  Lti  W  ~  CWp(n,  S)  where  n>  p  and  S  >  0.  Let  a  €  C”  such 
that  a  ^  0.  Then 

2a"E-ia  j 
a"W-ia  ^2(n-p+i)'^0) 

This  is  a  complexification  of  Arnold’s  lemma  17.10.  This  is  a  good  example 
that  shows  complexification  involves  more  than  merely  changing  transpose  to 
Hermitian  transpose. 


Proof.  This  is  an  expansion  and  complexification  of  Arnold’s  proof.  Let 


Co  — 


0 


and  partition  W  and  E  so  that  Wu  and  En  are  scalars.  W  = 
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/ 

/  \ 

Wii  W12 

^11  S12 

and  S  = 

,  mi  W22 1 

S21  LI22  j 

.  Let  y  =  W-K  Then 


Vn  -  a^W  ^ao=  q  ... 


^  Ki  K2 


V21  V22 


/  \ 
1 


\»/ 


By  Arnold  lemma  A.2(b)  [31],  which  applies  to  complex  as  well  as  to  real 
matrices, 


Vn  =  iWii-Wr2W,-^W2i)-'^ 


Thus 


a^W-^ao  =  (iy„  -  Wi2W,-,^W2i)-'^ 


Similarly, 

QqH  'flo  =  (Sll  —  Si2S22^S2i)~^ 

For  the  special  case  of  A:  =  1  and  /x  =  0, 

a^Wao  ~  Ciyi(n,a"Sao)  =  Ciyi(n,Sn) 

We  will  take  advantage  of  the  fact  that  a^W-^ao  and  a^S'^ao  are  scalars. 

_  (Su  -£i2Sj2>S2i)-^  _  Wn  -  ^12^22^1^21 

a^W-^ao  (Wii  -  1^121^22*  W2i)-I  “  S,l  -  Sl2Sj2*S21 


V  =  Wn~  ^12^2-2*^21 


From  corollary  19,  let 


536 


—  Sn  —  Ei2SJ2^E2i 

T  =  VV22,  and  U  =  W22^2i-  Lemma  19  established  that  f{V)  =  f{V  \  {U,T)) 
and  ^  ~  X2(n-p+i)(0)-  Thus  we  get 

“  X2(n-p+l)(0) 


^Wn  -  W^2W22^W2i  _  ^  _  ,,2 


Sn  —  £12^22^ ^21 


or 


2aQ  £  ^ Op  2  /«\ 

a^W-^ao 

So,  the  lemma  is  true  for  a  =  cq. 

Now,  let  a  =  Acq  where  A  is  invertible.  The  vector  a  is  the  first  column  of 
A.  By  theorem  54  for  B  6  and  W  ~  CVKp(n,  S,  6),  then 

BWB^  ~  CWj,{n,B^B^,B6B^) 

Let  B  =  and  6  =  0.  Then 


A-^WA-”  ~  CWj,{n,A-^T.A-^) 


-1  v  A-H\ 


Thus 


2a"S-ia  2{Aao)^'£,-^{Aao)  2a"(A^S-M)ao 
affW-^a  ~  iAao)f^W-^{Aao)  ~  a^{Af^'W-^ A)ao 
2a"(A-‘SA-")-^ao  ^ 


a"(A-WA-")-‘ao 
by  our  proof  with  oq.  □ 


X2(n-p+l)(^) 


Theorem  65  Let  Z  and  W  be  independent  where  Z  CN,ip,E),  W 
CWp(n,S),  n  >  p,  and  S  >  0.  Then 

P  =  ~  F2p.2(n-p+l)(2/x"S-V) 
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This  is  a  complexification  of  Arnold's  theorem  17.11  [31]. 


Proof.  This  is  an  expansion  and  complexification  of  Arnold’s  proof.  Let 
U  =  Z^T,~^Z  and  V  =  -§ww^-  value  of  Z,  then  by  theorem  64 

we  know 

2(V  I  Z)  ~  x^(„-p+:)(0) 


Thus  V  is  independent  of  Z,  and  therefore  also  V  is  independent  of  U.  There¬ 
fore  2V  ~  X2(n-p+i)(0)-  theorem  63  we  know  2U  ~  In  the 

case  of  real  variables,  the  result  for  V  is  similar  to  Muirhead’s  theorem  3.2.12. 
Continuing,  we  form  the  ratio  for  the  F-statistic. 


^[/  IZ^E-^Z 

—J—V  ~  1 

2(n-p+l)  n-p-l-1  ZHW-iZ 


^X2p(2/^"S  V)  ^  2(n-p-t-l)xlp(2^"S-^/t) 

2(n-p+l)^2(n-p+l)(^)  X2(n-p+l)(®) 


^2p,2(n-p+l)(2p"S  V) 


Therefore 


^  =  »  P  +  izf^W-^Z  ~  F2p.2(„-p+i)(2p"S-V) 

P 

Maximum  likelihood  estimates  of  p  and  S  are  statistically  independent  and 
can  be  the  random  variables  used  in  this  statistic.  This  distribution  is  useful 
in  testing  hypotheses  about  p  and  for  establishing  confidence  regions  for  p. 

is  the  likelihood  ratio  for  testing  H  :  p  =  po.  Discussion  can  be  found 
beginning  on  p.  159  of  Anderson  [26|. 
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Theorem  66  Let  Z  ~  CA^p(/i,S)  such  that  E  >  0,  W  ~  CWp(n,E)  such  that 
H  >  0  and  n  >  p,  a„dY  G  C^.  Then 

Proof.  By  theorem  62  we  know 

2Y"2-'Y  , 

yHy^-iy  ~  ^2(n-p+l)V^/ 


From  theorem  65, 


2Z«S-'Z~x:y2^''S-V) 


This  implies 


2p 


^2p.2(n-p+l)(2M^S  V) 


2 


2(n-p+l) 

which  implies  our  result 


n  F2,,2,„_,+„(2,<«E-V) 


□ 


Corollary  20  Z  ~  CNp{p,aR),  W  ~  CWp{n,bR)  such  that  R>  0  and 
n  >  p,  and  a,  6  G  C.  Then 


ap  a 


Proof.  Let  r  =  Z,  S  =  a/?,  and  E  =  6/?.  Note  that  S"*  =  i/?"'.  Then 


2(n-p+l),„,  Z^W-'Z  2K..-P+1)  <'Z"fi-'Z'' 

Z«(6/?)-‘Z 


2p 


Z"(af?)  2„p  l^ZW«-izj^  ^  ^ 


The  final  result  follows  immediately  from  this  by  applying  theorem  66.  □ 


Appendix  E 
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DISTRIBUTIONS,  PART  II 

This  work  was  undertaken  to  determine  the  densit}'  function  of  the  complex 
Wishart  distribution  because  only  a  few  reports  of  the  density  function  ex¬ 
ists  in  the  literature,  and  these  were  not  identical.  I  considered  that  use  of 
the  correct  result  is  critical  to  the  primary  question  of  order  determination. 
In  preparing  a  background  for  this  task,  it  was  discovered  that  the  pieces 
of  needed  knowledge  were  scattered  throughout  the  literature  using  different 
notational  conventions,  did  not  form  a  complete  theory,  and  occasionally  con¬ 
tained  minor  errors  which  inevitably  get  through  any  editorial  and  publishing 
process.  In  writing  the  main  part  of  this  thesis,  I  intended  to  draw  only  on 
those  portions  of  the  general  development  of  complex  multivariate  statistics 
as  was  absolutely  needed.  It  became  obvious  that  it  was  both  needed  and 
simpler  to  produce  a  well  organized  presentation  of  the  material.  Much  of  the 
material  to  follow  has  most  likely  resided  in  the  minds  of  others,  but  I  have 
not  found  it.  Readers  are  encouraged  to  communicate  their  findings  so  that  a 
history  of  this  fascinating  subject  can  be  constructed. 

What  follows  began  as  a  complexification  of  Chapter  17  of  Arnold’s  well 
organized  text  [31].  It  has  a  very  nice  development  of  ideas,  it  uses  matrix  no¬ 
tation  throughout,  it  uses  norms  and  projections  where  it  can  do  so  profitably, 
and  draws  upon  some  group  thc'oretic  ideas.  To  develop  the  needed  theory  for 
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the  change  of  complex  variables,  extensive  adaptation  was  made  to  results  of 
Chapter  2  and  the  Appendix  of  Muirhead’s  fine  text  [187].  Other  references 
have  been  used  where  it  was  needed.  An  attempt  has  been  made  to  accentuate 
the  similarity  of  this  work  with  the  works  of  others.  By  reading  the  sources 
and  examining  the  enclosed  work,  it  is  hoped  that  others  may  learn  quickly 
how  to  make  adaptations  from  real  variables  to  complex  variables. 


E.l  Complex  Wishart  Density 

The  purpose  of  this  appendix  is  to  prove  the  form  of  the  density  function  of 
the  complex  Wishart  distribution.  Several  respectable  references  give  conflict¬ 
ing  expressions  for  this  density  function.  It  is  shown  in  this  appendix  that 
Goodman  [92]  provided  the  correct  form.  Use  of  the  correct  form  is  critical 
to  future  work.  Therefore,  three  different  derivations  are  presented  to  gain 
confidence  that  the  correct  result  is  obtained.  The  first  is  a  complexification 
of  the  derivation  done  by  Arnold  [31]  which  gives  a  proof  by  induction.  This 
approach  has  not  been  previously  applied  to  the  complex  case.  The  second 
derivation  is  the  one  by  Goodman  from  his  classic  paper  cited  above.  The  third 
more  general  result  is  by  Srivastava  [256]  which  has  the  complex  Wishart  den¬ 
sity  as  a  special  case.  It  is  reassuring  that  we  get  the  same  answer  in  thr.  3 


different  ways. 


E.1.1  Arnold’s  Proof  by  Induction 
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This  is  a  detailed  complexification  and  extension  of  the  derivation  provided  by 
Arnold  in  Section  17.6  of  his  text.  The  goal  is  to  develop  a  complex  version  of 
Arnold’s  Theorem  17.12.  We  want  to  find  the  density  function  for  the  nonsin¬ 
gular  central  complex  Wishart  distribution.  First,  consider  the  simplified  case 
of  E  =  7. 

Let  W  ~  CWp{n,  I),  n  >  p. 


First,  let  p  =  1.  Then  by  Theorem  53,  21V  ~  X2n(0)- 
Anderson  [26]  gives  the  density  of  the  noncentral  distribution  with  p 
degrees  of  freedom  and  noncentrality  parameter  as 


1 


f(u;p,T^)  =  ^exp 


(7) 


0  r 


'“1 


du 


Let  p  =  2n,  =  0,  0  —  k,  and  u  =  x.  Then 


Perform  a  change  of  variables  x  =  2W,  which  implies  dx  —  2{dW).  Then  W 
has  a  density  function  given  by 


f(W)  =  2-"  ^ (2H/)”-'  exp(-W)2(dW) 


VF"-^exp(-VF)(dVF) 


For  p  >  1,  we  will  prove  by  induction  that 
f{W)  = 

s-p(P-i)/2  n  r(n  -  1) 

i=l 
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Then,  the  result  will  be  extended  to  CH^p(n,  S)  by  a  change  of  variables. 
Assume  that 

_  |det  exp(-lrVV„-.] 

JK^n-l)  -  - - 

7r(p-i)(p-2)/2  f]  r(n-i  +  l) 

«=1 

Then,  we  can  express  W  in  terms  of  T,U,  and  V.  We  know  conditional  dis¬ 
tributions  involving  T,U,  and  V  from  our  proof  of  lemma  19.  Recall  that 
the  assumptions  were  W  ~  CWp{n,  S),  S  >  0,  n  >  p  where  W  and  E  are 
partitioned  such  that  Wn  and  En  are  9x9.  We  defined  T  =  W22,  U  = 
W22^W2u  and  V  =  Wn-  Then 

T~CWp_,(n,E22) 

(t/  )  T)  ~  CiVp_,,,(E2-2'E21,T->,Ei,  -  Ei2S2-2'E2i) 

V  I  (t/,T)  ~  CW,(n  -p-}-9,E„  -  E,2S2-2'S2,) 

The  joint  density  of  T,  U,  V  is  then 

/(r,(/,  V)  =  f{V  I  U,T)f{U,T)  =  f{V  \  U,T)f{U  I  T)f{T) 

By  a  change  of  variables,  we  get 

fiW)  =  fvHU,T)iWu  -  W,2W22‘W2,  I  ^22*^2,,  W22)X  (E.l) 

x/i;,r(W-‘W2,  1  W22)/r(W22)|J| 

To  evaluate  the  Jacobian,  look  closer  at  the  change  of  variables.  Suppose 

Vi  =  9i{xi,X2,X3)  =  W22  -  T  =  gx{T,U,V) 

92  =  92(3:1,  X2,X3)  =  W2\  =  TV  =  92{T,U,  V) 

93  =  93{xx,X2,X3)  =  Wn=V  +  V»r"V  =  93(1' ,  U,  V) 
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The  inverse  transformations  are 

3^1  =  /l(j/l)y2>J/3)  —  T  =  W22  =  f\{W22,  W2I,  M^ll) 

X2  =  /2(yi,y2,J/3)  =  U  =  W^^^W2l  =  f2iW22,W2^,Wn) 

X3  =  /3(j/l,j/2,y3)  =  V  =  Wri-  11^12  W^2"2'W^21  =  /3(H^22,  W^21 ,  ) 

The  absolute  value  of  the  Jacobian  is  computed  as 


where  the  dots  indicate  terms  not  evaluated  because  they  can  be  ignored  when 

12 
det  14^22^  rather 

than  jdet  14^22^  in  evaluating  the  Jacobian  because  we  are  now  doing  a  change 
of  complex  variables,  in  accordance  with  theorem  22.  Thus,  we  can  write 

|detj|=  |detVF2V} 

Restricting  attention  to  the  case  where  q  =  1,  for 


we  obtain 


(£/  I  r)  ~  CJV,_..,(E;i,S",T->,S„  - 
V  I  (U,T)  ~  CM^.(n  -  p+  1,2„  -  S.22;i,S?2) 
The  absolute  value  of  the  Jacobian  is 

jJ(T,U,V-*Wu,m2,iV„.i)l  =  |detiy„_ir'* 

Then  equation  E.l  becomes  /(IV)  = 

|h^ii  -  etr  [-(S„  - 


r(n  —  p  4-  l)r(n  —  p  +  2)r(n  —  p  +  3)  •  ■  •  r(n) 


Observe  that 

1  ,  (p-l)(p-2)  2p-2  +  p2-3p  +  2  p(p-l) 

p_l  + - - - = - - -  = - - - 
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and 

Wn  -  w,2W-},wli\ ■  =  \wr^ 

by  the  partitioned  matrix  determinant.  Also, 

IriWSWnW;},)  =  tTiWnW-l.Wli) 

by  property  of  the  trace  function.  Thus 

^  |iyr'’exp(-H^i,-tr(iy„-,))  ^  |Vyr’’exp(-H'} 

7rP(p-i)/2  ]7  r(n  —  p-\-  i)  7rP(P“^)/2  []  ^(n  —  p  i) 

«=i  t=i 

We  recognize  this  as  the  distribution  function  of  CWp(n,  7).  Thus  by  in¬ 
duction,  we  proved  the  result  true  for  p  =  1,  assumed  it  was  true  for  p  —  1, 
and  based  on  this  assumption  showed  it  true  for  p.  Thus,  it  is  true  for  all  p. 

Now,  the  result  must  be  extended  to  general  =  S  >  0.  To  bridge 
the  gap  from  CWp(n,  I)  to  CWp(n,  E),  I  must  first  make  the  transition  from 
CN{0,  7, 7)  to  CA^(0, 7,  S).  Suppose  that  the  n  x  p  matrix  random  variable  Y 
has  the  matrix  complex  normal  distribution  CA^„,p(0„xp,  7„,  Epxp).  By  theorem 
51,  the  density  of  Y  is  given  by 

f{Y)  =  TT-""  pr^exp  [-tr(rs-'y")] 

Since  E"'  is  positive  definite,  it  can  be  factored  into  E~’  =  TT^  where  T  is 
p  X  p  lower  triangular  with  positive  real  diagonal  elements.  Thus 


f{Y)  =  TT-""  |Er"  exp  [-  tr( VTr"r")] 
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Let  X  =  YT,  and  thus  Y  =  XT  Then  by  lemma  5  the  Jacobian  is  J{Y  —* 
X)=  n  Therefore 

i=l 

f{XT-^)  =  7r-”P  |Sr"  exp  [-  tr(XX")] 

Since  I  left  this  as  f{XT~^)  rather  than  f{X),  no  Jacobian  was  needed. 

From  this  point,  this  is  a  complexification  and  expansion  of  Arnold’s  corol¬ 
lary  to  his  theorem  17.12.  By  corollary  36  (Cholesky  or  Bartlett  Decompo¬ 
sition)  there  exists  a  unique  p  x  p  lower  triangular  matrix  L  with  positive 
diagonal  elements  such  that  S  =  LL^ .  Let  B  ~  CWp{n,I).  Then 

W  =  LBL^  ~  CWp{n,  LIL”)  =  CWp{n,  S) 

by  theorem  54.  The  Jacobian  is  J{B  — »  W)  =  (det  E)"^.  Thus 

f{W)  =  fB{L-^WL-^)J{B  W) 


and  so 

|det(L-^  WL-")r~'’  etr(-L-»  VFL'") 

f{W)  =  J— ^ - -{dW) 

7rP(p-*)/2(det  S)P  n  r(m  —  z  -f  1) 

i-I 


(det  I)-("-p)  [det  Wr”  (det  etr(-L-"L-»  W) , 

(det  S)PCrp(n)  ^  ^ 

(det  |det  Wl^-” etr(-E->  W) 

(detS)pCrp(n)  ^  ’ 


f{W)  = 


|detlTr>’etr(-E->iy) 
(detS)"Crp(n)  ^  ’ 


(E.2) 


where  Crp(n)  is  the  complex  multivariate  gamma  function.  This  is  the  fi¬ 
nal  form  of  the  probability  density  function  for  the  central  complex  Wishart 
distribution  CVFp(n,  E). 
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E.1.2  Goodman’s  Construction 

The  following  derivation  follows  that  by  Goodman  [92].  Goodman  is  the  first 
to  publish  the  density  of  the  complex  Wishart  distribution. 

Consider  the  matrix  Laplace  transform  of  (det  W)^  where  VF  is  a  random 
p  X  p  Hermitian  positive  definite  matrix  variable  =  fT  >  0.  This  is  given 

by 

I{^)=  f  |iy|*exp[-tr{S-*lT)](diy)  (E.3) 

Jw 

where  the  integral  is  taken  over  all  W  =  >  0. 

Let  T^T  =  VE  be  a  change  of  variables  from  W  to  T,  where  T  is  a  com¬ 
plex  upper  triangular  matrix  with  positive  real  elements  on  the  diagonal.  By 
theorem  27,  the  Jacobian  of  this  transformation  is 

J{W  ^  T)  =  2P  n  (E.4) 

«=i 

Consider  the  special  case  where  S  =  =  diag(Aj,  •  •  • ,  A*).  Then  \~^T^T  = 


548 


{  ^11  ^11^12 


\ 


X 


^12^11 


1^12!^  +  ^22 


i\2^lp  +  ^22^2p 


^yii  ^yi2  +  t%t22  ■  •  •  \tipf  +  Kf  +  •  •  •  +  \tppf  ) 

K^'th  ••■  XT^'tnhp 

^2^^12^n  -^2  ^(1^12!^  +  ■■■  -^2  ^(^12^1p  +  ^22^2p) 


A-2(fyi2  +  <y22)  •••  A;2(|^ip|'+|<2pP  +  .--+M")  y 

Thus  tr(A-2r"T)  = 


Ai  ^^11  +  A2  ^  (1^12!^  +  ^22)  - !■  Ap ^  (!<ip|^  +  l<2p!^  +  •  •  •  4-  fpp) 


In  preparation  for  the  next  leap  of  faith,  we  note  that 


(detiy)*'  =  [det(r"r)]*'  =  |detr"|*  •  |detr|^  =  |detr|^^ 


since  det  =  |det  T\ .  Substituting  back  into  equation  E.3  we  get 


2k 

tii  I  X 


I{A^)  =  j  \detTf'‘exp[-tT{A-^T^T)]J{W  T){dT) 

■He 

X  exp  -  AJ^(|<i2|^  +  ^22) - Ap^dtipl'^  +  lt2pl^  H - 1-  ^pp)]  x 

x2'(n '?'"*■) 

n  / exp  (-A-=ij)  n  n  / e^p  (-\-“  if„r)  * 

.1=1 t=2j=l'' 


=  2" 


{dT) 

p  i-i 


=  2P 


nn(A?>r) 

.:=1  \  i=2j=l 


2P2-P 


Li=l 


TT 


p(p-l)/2 


nAf-'i 


Lt=2 


where 


nn7r  =  7r^<'’-i)/2 

i=2 j=l 

This  is  most  easily  seen  by  considering  a  triangular  array  Oi  the  constant  tt. 


TT  TT  TT 


X  X 


X 


X 


TT  J 

X  i 


X 


pxp 


The  columns  are  indexed  by  i  and  the  rows  are  indexed  by  j.  There  are 
elements  in  the  array  above  the  diagonal.  Also  note  that  =  1  when 

z  =  1.  Thus 

t=2  «=1 

which  implies 

J(A2)  =  (Af<^+'’"‘+’+“’>r(it  +  p-z  +  1)) 

f=i 

=  f[  (Ap+'’^r(ik  +  p  -  i  + 1)) 
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since  det(A^)  =  0  -^i^- 
1=1 


J( A^)  =  7rP(P-*)/2  [det  A^J  n  r(ifc  +  ^) 


f[T{k  +  p-i  +  l)  =  rik  +  p)r{k  +  p-l)---Tik  +  l)  =  l[Tik  +  i) 


Therefore 


I(A2)  =  J  |M/^l*etr[-A-2vr](dW^) 

=  [det  A^]  r(jfc  +  p)T{k  +  p  -  1) . .  .r{k  +  1) 

Let  S  =  A^U  where  U^U  =  I.  Note:  S  =  Then 

JdetW^l''etr[-S-'l^](diy)  =  J^^JdetW\’‘ eiv[-U^ A'^U  W]{dW) 

=  (  (dettTl'^etrf-A-^C/ 

Jw>0 

Let  K  =  U  WU^  which  implies  W  =  U^KU.  By  corollary  7,  J{K  — >  W)  =  1 
which  implies  J{W  —*  K)  =  1.  This  gives  us 

|detiyi''etr[-A-'*W^](dH^)  =  j^^Jdet(t/"A'f/)|*' etri-A'^A'KdA') 

=  /  lA:|*'etr[-A-2A'](dA') 

Jk>o 

=  T,v(P-m  [det  A"*] r(ifc  +  p)r{k  +  p  - 1)  •  •  •  r(fc  + 1) 

=  [det  r{k  +  p)r(fc  +  p-  i)---r(fc  +  i) 


det(S)  =  det(U^A'^U)  =  det(A'') 
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Therefore 

/ty^o|detiy|*=etr[-S-^lT](^lT)  _  ^ 

7rP(p-i)/2  [det  n  r(/b  +  i) 

«=i 

Thus 

7rP(p-i)/2  [det  n  r(ifc  +  i) 

1=1 

is  a  density  function  on  the  space  of  Hermitian  positive  definite  matrices.  Let 
k  =  n  —  p.  Then  for  ClTp  (n,  E)  we  have 


/(lT;n,p,E) 


Idetiyr-etrl-S-W] 

irip-i)/2  (det  S)”  n  r(n  -  p  +  i) 

1=1 


=  (d,V)  = 

^P(p-i)/2  [det  S]”  n  r(»  -  i  +  1) 

i=l 

This  agrees  with  Goodman  equation  (1.6)  [92].  The  function  /(VT;E)  is 
the  probability  density  function  for  the  central  complex  Wishart  distribution. 
For  the  density  function  to  exist,  E  must  be  nonsingular.  Note  that  both  W 
and  E  are  Hermitian  positive  definite.  VF  is  a  random  variable.  S,  n,  and  p 
are  fixed  parameters.  fF  is  p  x  p  of  full  rank.  Li  equation  E.3,  k  is  an  arbitrary 
complex  constant  with  Re(/:  +  i)  >  0  for  1  <  i  <  p.  We  finally  let  k  =  n  —  p  to 
obtain  the  density  for  the  complex  Wishart  distribution  where  n  is  taken  to  be 
the  number  of  samples  of  the  multivariate  complex  normal  distribution  used 
in  forming  VF.  This  relation  becomes  more  apparent  in  other  derivations  of 
this  density  function.  It  can  also  be  seen  in  the  extension  to  the  complex  case 
of  Arnold’s  discussion  of  the  Wishart  distribution.  A  more  compact  notation 
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for  the  density  function  is 

^  IdetH^netrl-S-WI 

[detS|”Cr,(n)  '  ’ 

where 

Crp(n)  =  r(n  -  i  +  1) 

t=i 

is  the  complex  multivariate  gamma  function.  A  shorthand  notation  for  this 
complex  Wishart  distribution  is  VT  ~  CWp{n,  E).  This  distribution  has  not 
been  around  and  used  long  enough  for  a  notation  to  become  standard.  Of 
course,  we  still  do  not  have  a  universally  accepted  notation  for  the  chi-square 
distribution  yet,  either. 

E.1.3  Srivastava’s  Derivation 

Srivastava  [256]  provided  a  derivation  that  obtained  a  more  general  result  in 
the  process.  The  following  discussion  expands  Srivastava’s  work.  He  begins 
with  a  Lemma. 

Lemma  20  IfY  is  a  matrix  of  complex  elements  of  order  px  m  where  p  <  m, 
and  rank{Y)  =  p,  then  there  exists  a  unique  lower  triangular  matrix  T  with 
positive  real  diagonal  elements  and  a  matrix  H  such  that  =  Ip  and 

Y  =  TH.  The  matrices  live  in  the  following  spaces:  Y  E  C'”'"*,  T  E  , 

and  H  E  C'”'"’.  Note  that  because  II  is  not  square  and  because  HII^  =  /p,  H 


is  called  semi-unitary. 
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Proof.  This  is  corollary  37,  where  Y  =  A,  and  T  =  L,  with  appropriate 
changes  in  dimensions.  □ 


Density  of  a  Quadratic 

The  following  theorem  is  a  generalization  of  Anderson’s  Lemma  13.3.1  (2nd. 
Edition,  p.  533)  [26]  to  a  matrix  Y  with  complex  elements.  The  original 
citation  is  to  Anderson  (1st.  Edition,  1958,  p.  319). 


Theorem  67  (Important^  If  the  density  f{YY^)  of  Yp^m  is  a  function  of 
YY^ ,  then  the  density  of  B  =  YY^  is  given  by 


n  r(m  -  i  +  1) 


a-P”'Crp(m) 


<=1 


Note  that  when  Y  ~  CNm,piOmxp^  Ip),  then 


p{Y)  =  7r-’"'’etr(-yy")  =  f{YY^) 


Y  can  also  be  viewed  as  a  random  sample  of  size  m  from  the  complex  multi¬ 
variate  standard  normal  distribution  CA^p(0p, /p).  Then  f{B)  is  the  density  of 
B^CWpinJp). 

Proof.  From  lemma  20,  we  can  write  Y  =  TH.  Tpxp  is  complex  lower 
triangular  with  positive  real  diagonal  elements.  =  Ip. 

We  shall  now  find  the  Jacobian  of  the  transformation,  J(Y  — >  T,H).  A 
basic  property  of  Jacobians  that  is  used  is  that  if  changing  variables  from  X  to 
y,  then  the  Jacobian  J{X  —*  y)  is  the  same  as  the  Jacobian  J[{dX)  (dy)] 
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of  the  change  of  variables  from  {dX)  to  {dY).  See  Deemer  and  Olkin  Property 
5.B.3  [67].  We  see  that  Y  =  TH  implies  that  {dY)  =  {dT)H  +  T{dH). 
The  object  now  is  to  define  a  series  of  transformations  where  the  Jacobian 
of  each  individual  transformation  is  more  easily  computed  than  the  original 
transformation.  We  will  take  advantage  of  a  property  of  Jacobians  that  says 
that 

J{X  F)  =  J{X  -4  U)J{U  -4  V)J{V  -V  Y) 

where  U  and  V  are  any  functions  of  X  and  Y  such  that  none  of  the  terms  on 
the  right  vanish.  See  Deemer  and  Olkin  Property  5.B.2  [67].  For  completeness’ 
sake,  there  are  two  more  general  properties  of  Jacobians  that  deserve  to  be 
mentioned.  The  first  is 

The  second  is  a  bit  longer.  If  X  =  Fi{U)  and  Y  =  F2{V)  defines  a  transfor¬ 
mation  from  variables  {X,  Y)  to  new  variables  ([/,  V),  then 

J[{X,  Y)  (f/,  V)]  =  J{X  ^  U)J{Y  V) 

The  sequence  of  transformations  that  Srivastava  chose  are  as  follows.  Given 

{dY)  =  {dT)H  +  T{dH) 

premultiply  by  T“’  to  get 


T-\dY)  =  T-\dT)H  +  {dH) 
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Let  U  =  T~^{dY),  resulting  in 

U  =  T-\dT)H  +  {dH) 

Let  V  =  T~^{dT)  to  produce  U  =  VH  (dH).  The  Jacobian  is  given  by 

J(Y  ^T,H)  =  J{dY  dT,  dH) 

=  J{dY  U)J(U  YdH)J[iV,dH)  ^  (dT,dH)] 

To  see  where  this  comes  from,  notice  that  U  =  T~^(dY)  implies  dY  =  TU. 
Therefore  we  seek  J(dY  —♦[/)  =  Jy.  We  also  see  that  U  =  VH  +  (dH)  implies 
that  we  seek  J[U  — »  (V,dH)]  =  J2.  Treating  V  as  a  matrix  of  constants,  this 
Jacobian  is  a  function  of  only  H.  Let 

J[U-^(V,dH)]  =  g(H)  =  J2 

The  relation  V  =  T~^(dT)  implies  we  seek  J[(V,d/f)  -+  (dT,dH)]  =  J3. 
Putting  this  all  together,  we  see  that 


II 

T 

using  J\(dY  — >  U) 

TU  =  T[VH  +  (dH)] 

using  J2(U  — ♦  H) 

T[VH  +  (dH)]  =  TVH  +  T(dH)  =  TT-^(dT)H  +  T(dH)  using  MV  dT) 
TT-^(dT)H  +  T(dH)  =  (dT)H  +  T(dH)  =  d(TH)  =  (dY)  by  theorem  21 
Consider  J\  =  J(dY  U)  where  dY  =  TU.  The  matrix  dY  is  a  matrix 
whose  elements  are  the  differentials  of  Ypxm-  Thus,  dY  is  also  of  dimensions 
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p  X  m.  r  is  complex  lower  triangular  of  dimension  p  x  p  with  positive  real 
diagonal  elements,  f/  is  p  x  m.  By  lemma  3,  we  know  that 

j,  =  j{dY  ^  t/)  =  n  i?r 

i=l 

Consider 

J3  =  J[{V,dH)  {dT,dH)]  =  JiV  dT) 

where  V  =  T~^{dT).  Tpxp  is  lower  triangular,  so  (dT)  and  T~^  are  also  lower 
triangular.  Hence,  V  is  also  lower  triangular.  All  have  real  diagonal  elements. 
It  is  simpler  to  examine  (dT)  =  TV  and  J{dT  —>■  V),  and  then  take  the  inverse 
of  the  Jacobian.  By  lemma  6, 

j(dT  V) = n  ir  ■ 

«=i 

Thus 

J(T-^dr)=: 

»=1 

As  noted  before,  J2  =  g{H)  \s  a,  function  of  H.  We  avoid  explicitly  evalu¬ 
ating  it  by  integrating  it  out  in  the  next  step  to  find  the  density  of  T . 

The  joint  probability  density  of  T  and  H  is  found  by  the  change  of  variables 

!(YY’<)dY  =  f\(TH)(TH)''\J,Jihd(T,H)  =  !(TT”)Jihhd(T,H) 

nvY")  =  /(rr")  (n'fr)  9{i>)  (n 


Thus 
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=  p(T,  H)  =  f{TT»)g{H)  H 

i=\ 

Integrating  out  H  to  obtain  the  density  of  T  alone,  we  get 

p{T)  =  j  p(T,  H)(dH)  =  fiTT»)  ^  (E.5) 

where  the  integral  is  over  all  H  such  that  HH^  =  7p.  Let  C\  =  g(H)(dH). 

This  Cl  will  be  evaluated  later,  and  will  be  shown  to  contain  the  information 
about  the  distribution  of  Y.  At  this  point,  it  is  worth  pointing  out  that 
f{YY^)  can  by  any  function  of  YY^ ,  and  we  are  merely  doing  changes  of 
variables.  This  only  takes  on  importance  in  probability  when  we  later  choose 
a  function  /  of  a  quadratic  which  also  turns  out  to  be  a  probability  density 
function.  So,  this  derivation  is  really  quite  general.  In  the  search  for  the 
complex  Wishart  density,  we  will  choose  f(ZZ^)  such  that  f(Z)  is  the  complex 
matrix  standard  normal  distribution. 

Make  the  transformation 

B  =  rr"  =  {TH){THf  =  THH^T^  =  TT^ 

By  Khatri  section  (2.8)  [137],  which  was  proven  as  theorem  26, 

J{B  TT”)  =  2^  n 

<=i 

T  must  be  lower  triangular  for  this  to  be  true.  Thus 

-^B)  =  2-p  fi 

i=l 
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Hence,  the  density  of  B  is 

p(«) = c.  /(B) 

which  simplifies  to 

p{B)  =  Ci2-P/(H)  n  =  Ci2-^f[B)  |det 

i=l 

=  (7i2-'’/(5)|det(Tr")P''’ 
since  T  has  real  diagonal  elements.  Finally, 

p(J9)  =  C,2-P|detBr-V(H)  (E.6) 

This  is  Srivastava’s  main  result. 

Specialization  to  Complex  Wishart 

Srivastava’s  main  result  is  now  specialized  to  the  case  where  Y  ~  CjV„,,p(0,  /m,  Ip) 
by  evaluating  the  constant  C\ .  The  probability  density  for  Y  is  given  by  the¬ 
orem  51  as 

p(F)  =  7r-"*Petr[-rr^]  =  eiv\-{T  H){T  H)^\ 

=  7r-”‘'’etr[-rT"]  =  J{TT^) 

Thus,  f{B)  =  7r~”*'’etr[— 5]. 

Returning  to  equation  E.5,  we  have 

p{T)  =  C,/(TT")  n  =  C,7r-'"'’etr(-rr")  H 

»=:1  1=1 
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Since  p{T)  is  a  probability  density,  if  we  integrate  over  all  T,  we  get  unity. 

1  =  J  p{T){dT)  =  CiTT-'^p  J  etr(-rr^)nt^/’"'‘^+'(dr) 

Concentrate  on  the  integral.  From  the  proof  of  theorem  26,  note  that 


tr(rr  )  — til  + 1^21!  +^22  +  1^31!  +1^32!  +  ^33  +  •  •  •  +  jtp-i^i  + 

+  |tp-l,2p  H - 1-  tp_i,p_i  +  |tplp  +  |tp2|^  - h  tpp 

The  integral  can  thus  be  expanded  as 

1  =  J etr(-rT") 

=  j  •  •  •  e‘pp<iJ’”"*^+‘t22’”"^^'^^  •  •  •  tpj,’”"'’^'^^(dT) 

=  (^J  tfr-')+'e-‘ndtu)  (y  t|"^-'*)+'e-‘«dt22)  •••  (/  tj;,’"-'’)+'e-‘ppdtpp) 
X  ^ye"'‘*‘l'dt2i^  Q  •••  (^y  e"l‘p'’-'l"dtp,p_i) 

=  [n  (/  n  n  (/ 

By  lemma  64,  /  =  tt,  and  by  lemma  65, 

y —  i  +  l),  m  —  i  +  l>0 


Therefore, 

I  =  [n  (^r(”*  -  ^  + ')) 


p  «-l 

nii’r 

,t=2i=i 


=  2-P7rP<P-^)/2^r(m-i  +  l) 


t=i 


where  m  —  p  +  1  >  0.  Returning  to  the  evaluation  of  Ci,  we  see 

1  =  (7i7r-"*P2-P7rP('’'i>/2  fl  r(m  -  i  +  1) 


1=1 
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=  C'i2-P7r'’f2(p-»)-'")  f[r{Tn-i  +  l) 


i=i 


Solving  for  Ci  yields 


Ci  = 


1 


2P,rP” 


2-PxPf5(p-i)-H  n  r(m  -  i  +  1)  7rP(p-i)/2  f]  r(m  -  i  +  1) 

«=i  t=i 

Substituting  into  equation  E.6  yields  theorem  67. 

p{B)  = - -  -2~P  |det  fit'""'’ /(5)  =  — [liEl 

''  ir-p^CTpim)  ’  7r-P"*Crp(m) 

Substituting  f{B)  =  7r“”*Petr(— B)  gives  us 

|detBr-Petr(-J5)  |detBr-»’etr(-B)  .  ,  ^ 

P(5)  =  ^ ^  ^ ^ for  m  -  p  +  1  >  0 


Crp(m) 

This  is  Goodman’s  result  when  S  =  /. 


7rP(p-i)/2  P^TTj  _  j  q.  1) 
t=i 


Suppose  that  Y  ~  CNm,piOmxpi  Im^^pxp)-  This  is  the  same  as  having  a 
random  sample  of  size  m  from  the  complex  normal  distribution  CA^p(0p,  Spxp)- 
By  theorem  51,  the  density  of  Y  is 


p{Y)  =  7r-’"p|i:r’"etr(-rs-^y") 


By  corollary  36,  since  is  positive  definite,  it  can  be  factored  into  S~^  = 
TT^  where  T  \s  p  x  p  lower  triangular  with  positive  real  diagonal  elements. 
Thus 


p{Y)  =  7r-’"P  (Er”*  etr(-r7’7’"r^) 

Let  X  =  YT  which  implies  Y  =  XT"*.  By  lemma  5,  J{Y  — ♦  X)  =  fl 

tsl 

Then  p(Xr-‘)  =  7r-’"PlSr'"etr(-XX"). 
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We  want  to  find  the  density  function  for  CWp(m,  E).  We  get  CWp{m,  S) 
variables  by  obtaining  a  random  sample  of  size  m  from  the  complex  multivari¬ 
ate  normal  distribution  CA^p(0p,  Epxp).  This  yields  a  complex  matrix  normal 
random  variable  Y  ~  CNm,p(0mxp,  ^mi^pxp)- 

By  corollary  36  there  exists  a  unique  p  x  p  lower  triangular  matrix  L  with 
positive  diagonal  elements  such  that  S  =  LL^ .  Let  B  ~  CWp{rn,  I).  Then 

W  =  LBL”  ~  CWp(m,  LIL^)  =  CWp(m,  E) 


by  theorem  54.  By  theorem  24,  J{B  W)  =  (det  E)~p.  Thus 


fiW)  =  p{L-^WL~^)JiB  ^  W) 

7rp(p-i)/2(det  E)p  fl  —  i  -1- 1) 

1=1 

(det  L)-("»-P)  |det  Wr~P  (det  etr(-I-^^ W) 

(det  E)pCrp(m) 

_  (det  E)-("»-P)  |det  Wr~P  etr(  -E~^  W) 

(det  E)pCrp(m) 


Therefore 


^  |detVyr-elr(-£->y) 

'  (det  S)"Cr,(m)  '  ' 

This  is  the  same  answer  obtained  by  Goodman  [92].  The  extension  of 


Strivastava’s  result  [256]  to  CWp(m,  E)  was  motivated  by  Arnold’s  introduc¬ 
tion  [31]  to  the  proof  of  his  Corollary  to  his  Theorem  17.12,  which  is  an  ex¬ 
tension  of  Wp(m,  I)  to  Wp(m,  E). 


E.2  General  Theorem  on  Density  of  Eigen¬ 


values 

Theorem  68  important j  If  the  Hermitian  matrix  X  has  a  density  of  the 
form 

aifi,f2,  ■■■Jp]  where  /i  >  /2  >  •  •  •  >  /p 
are  the  eigenvalues  of  X ,  then  the  joint  density  of  the  roots  is  given  by 

7rP(p~i)  P 

This  is  a  complexification  of  theorem  13.3.1  of  Anderson  (p.  532}[26],  which 
is  also  similar  to  theorem  3.2.17  of  Muirhead  [187]. 

Proof.  I  have  followed  Anderson’s  general  logic,  substituting  the  Jacobians 
and  other  necessary  changes  to  transform  it  to  the  complex  case.  Recall  from 
theorem  7  that  the  joint  density  of  the  eigenvalues  {/,}i  that  satisfy  det[A  — 
f{A  +  B)]  =  0  where  A  ~  CVPp(m,  Ip)  and  B  ~  CVKp(n,  Ip)  is  given  by 

n(/.  -  /<)’ 

}<} 

Suppose  we  let  A  =  WW^,  G  =  CC^  =  B  +  \VW^,  and  W  =  CU.  Then 
0  =  det[A  -  fiA  +  B)]  =  detlWW"  -  fG] 

=  dei{CUU^C^  ~  fCC^]  =  det(C')det(f^f/"  -  //p)det(C") 


9{F) 


=  i~ 

\cr 


p(p-i)Crp(m  +  n) 


,(m)Crp(n)Crp(p)^ 


il/r'’!!-/.)”-'’ 
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Therefore,  the  roots  of  det[y4  —  /(/!  +  5)]  =  0  are  also  the  roots  of 

det(t/f/"  -  //p)  =  0 

The  general  strategy  is  to  first  do  a  change  of  variables  using  the  eigenvalue 
decomposition  .Y  =  CFC^  where  F  =  diag(/i,  ■  ■  • ,  fp)  and  fi  >  ■  ■  ■  >  fp.  We 
know  trom  theorem  115  that  we  can  do  this.  As  we  reasoned  in  theorem  7, 
we  choose  the  phase  Ok  of  the  scaling  for  each  c/n  so  that  >  0  to  force 

the  transformation  from  X  to  {F,C)  to  be  unique.  Let  the  Jacobian  of  this 
transformation  be  J[X  — >  (F,  C)].  Then  the  joint  density  of  (F,C)  is 

g{fu---Jp)J[X-^FC] 

The  marginal  density  of  F  is  given  by 

J[X  -*FC]idC)=9{X)  I  J[X-^F,C](dC) 

Jc  Jc 

To  evaluate  J[X  —*  F,C]{dC),  we  pick  a  distribution  for  .Y  for  which 
we  know  the  marginal  density,  g{F).  We  then  have 

giX)  I  J\X FC]{dC)  =  g(F) 

Jc 

Thus 

We  want  to  choose  a  distribution  for  A'  that  will  give  us  an  answer  easily. 
Let  X  =  UU^ .  In  theorem  94  we  constructed  a  random  variable  U  with  the 


density  function 
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Using  theorem  67,  since  g{U)  is  a  function  of  UU^ ,  then  the  density  of  X  = 
UU^  is  given  by 

Idetxr"'’  crp(m  +  n)  _p 


Crp(m  4-  n) 
Crp(m)Crp(n) 


|detXr-P  ldet(/p-X)r 


Since  the  eigenvalues  of  X  are  the  density  of  X  is  a  function  of  its 


eigenvalues. 


9(X)  =  f\ 


cr,(m)cr,(n)  I,y 


where  we  know  by  lemma  54  that  Ip  —  X  has  eigenvalues  1  —  The  joint 
density  of  (F,C)  is  then  g{  f i  fp)  J{X  — >  F,  C).  The  marginal  density  of 
F  is  given  by 

9(f)  =9(/i. ■■■./,)  f^J(X  ^  FX)(dC) 

=  cf'/d'Irt’ll  n  r-'d  -  /  J{X  ^  F.C)(dC) 

Clp(m)Clp(n)  Jc 

We  know  g{F)  from  the  beginning  of  this  proof  to  be 


9{F)  = 


;rP(p  i)Crp(m  +  n) 


crp(m)crp(n)crp(p) 


Ufrv-f.r-n  llif-fj) 


Solving  for  the  integral,  we  find 


/  J(X  ^  F,C)(dC)  = 
J  c 


;rP(P-')Crp(rn+r 

irp(m)Crp(n)Cr; 


u=» _  [i<i _ _ 

Crp(m+n)  [A  fm-p,,  _  f.\n-pl 

Tp(m)Crp(n)  y'-  J')  J 
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j^plp-i) 


crp(p) 

Therefore,  the  density  of  the  roots  of  X  is 


n(/.  -  fif 

Lt<i 


Tj-Pip-l) 

CTM 


n(/i  -  fif 

i<j 


which  finishes  the  proof.  □ 


E.3  Joint  Density  of  Eigenvalues  of  Complex 


Standard  Wishart 


Theorem  69  Let  A  ~  CVrp(n,/p).  Then  the  joint  density  of  the  eigenvalues 
of  A  is  given  by 


crp(n)crp{p)^''P 


i=l 


!!('?  - 


This  is  a  complexification  of  a  theorem  by  Anderson  (p.  534)  [^6].  It  agrees 
with  James  [120]  equation  (95)  for  the  case  of  T,  =  Ip  and  with  Khatri  [137] 
equation  (7.1.7). 

Proof.  The  density  of  A  is  given  by 

|det4r"'’etr(-/l) 

= - civw - 

By  theorem  68,  the  joint  density  of  the  eigenvalues  of  A  is  given  by 


n 

.i=l 


exp 


p  „ 
1-1 


7rP(p->) 


crp(n) 


crp(p) 


-  ID' 

i<J 
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"  crp(n)crp(p) 
which  completes  the  proof. 


'xpf-E'?]  fn'?H 

t-i  J  Lf=i  Lt<j 


E.4  Joint  Density  of  Eigenvalues  of  Complex 
Wishart 


Theorem  70  (Important^  Let  A  ~  ClVp(n,  S).  Then  the  joint  density  of  the 
eigenvalues  (/j,  •  ■  •  ,lp)  of  A  is 


rP(p-l) 


^ _ O-^Ol  ^  TT  j2{n-p)  tl(l2_l2\ 

crp(n)crp(p)  Idetsr  Lir  M- ’ 


This  result  was  written  down  by  inspection  without  derivation  by  James 
as  his  equation  (95)  [120]  for  the  complex  case  citing  the  similarity  of  forms 
of  the  real  case.  My  solution  is  written  in  terms  of  singular  values  rather  than 
eigenvalues,  and  thus  my  IJ  corresponds  to  other  authors’  U.  The  proof  that 
follows  is  done  without  making  reference  to  the  case  for  real  variables. 

Proof.  We  begin  by  following  the  theme  of  an  earlier  paper  by  James 
[117].  The  primary  concept  is  to  recognize  that  the  distribution  being  sought 
is  invariant  over  some  group.  This  leads  to  the  approach  of  averaging  over 
that  group.  James  [118][120]  gave  an  introduction  to  the  group  structure  that 
justified  his  approach.  A  more  complete  construction  of  the  group  under  study 
is  given  in  section  H.6  of  this  thesis.  We  begin  by  noting  that  the  distribution 


of  the  central  complex  Wishart  variable  A  is 


g(A)  = 


IdetAr-^etrl-S-M),^,, 

[detSlnCr,(n)- 


We  want  the  distribution  of  the  set  D  of  eigenvalues  of  A.  Let  A  —  U\DU^ . 
Notice  that  |det  y4|"“*’  is  a  function  only  of  D. 

The  term  we  need  to  deal  with  is  etr(— Recall  that  the  similarity 
transformation  B  =  AU  leaves  the  eigenvalues  unchanged  where  17  is  a 
unitary  matrix,  U  €  U(p).  In  fact,  a  convex  sum  of  distributions  is  again  a 
distribution.  Let  Ui,'-'Ur  be  r  fixed  unitary  matrices  and  let  be 

r 

positive  real  numbers  such  that  1C  =  1-  Then  if  A  has  the  distribution 

t=i 


IdetAr" 

[detS]"Crp(n) 


AUi)(dA) 

i=l 


the  distribution  of  D  is  unchanged.  With  a  suitable  choice  of  a  sequence  of 
sets  of  {Ui,  I/,),  this  function  tends  to 


=  [de^q-cr;(.)  L 


Notice  that  the  function  /u(p)  etr(— S“‘17^y4f/)(df/),  after  it  is  evaluated,  is 
only  a  function  of  S  and  the  eigenvalues  D  oi  A.  Only  the  elements  of  D  are 
left  as  random  variables.  This  ends  the  portion  of  the  proof  given  in  [117].  We 
now  apply  theorem  68.  Thus,  the  density  of  D  is  given  by 


dF{D)  = 


jdet  A]"-'’  /u(p) etr{-E-Wff AU){dU)  7rP(P-‘) 

[detEj^CLpCn)  Crp(p) 


IK'?  -  I’f 


t<j 


(dD) 


Rearranging  slightly,  we  get  dF{D)  = 
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|dety4r'’7rP(p-i) 

[detE]”Crp(n)Crp(p) 


JU(p) 


IK'?  - '?)" 


.*<i 


(dD) 

(E.7) 


p  . 

Recall  that  |det>l|"~'’  =  H  /f  " •  This  result  is  the  complexification  of  Muir- 

i=l 

head  theorem  3.2.18. 

I  supplied  the  next  portion  of  this  proof,  up  to  the  statement  of  the  next 
corollary.  Let  us  take  a  closer  look  at  the  integral.  Let  A  =  U\DU^  and 
S  =  PA2P«.  Then 


/  etr(-S-^C/"At/)(dt/)  =  f  etr{-{P\^P'^)-W^  {UiDU(^)U){dU) 

jU(p)  JlJ(p) 


/U(p) 


=  /  etv{-PA-^P^U^UiDU"U){dU) 

Ju(p] 

=  /  etT{-\-^P^U^UiDU^UP){dU) 

JV(p) 

Suppose  U  and  P  are  both  members  of  the  set  U(p)  of  p  x  p  unitary  matrices. 
By  definition,  we  know 


U”U  =  UU^  =  P^P  =  PP^  =  /p 


This  implies 

{UP)”  {UP)  =  P”U”UP  =  Ip 

Therefore  the  set  U(p)  is  closed  under  matrix  multiplication.  Let  V  =  U”U P  € 
U(p).  Then  {dU)  =  {dV)  because  U^  and  P  are  unitary.  Therefore  our  integral 
is 


/  etr(-A-“V«W)(rfl/)  =  /  etr  W 

•/U(p)  ''U(p)  Aj 
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where  =  cliag(Ai  ,  A'^)  and  D  =  diag(/J,  ■  •  • ,  Ij).  To  see  this,  observe 


Note  that  v^Dvj  is  a  scalar.  Then 


«=i  «=i 


The  trace  becomes 

j=l .=! 


Here  is  an  intermediate  result. 
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Corollary  21  Let  A  ~  CWp{Ti,  S  =  c^Ip)  have  eigenvalue  decomposition  A  = 
UiDU^ ,  and  let  V  €  U(p).  D  =  diag(/i,  •  •  • , /p).  Then  the  joint  density 


function  of  the  sample  eigenvalues  is  given  by 


dF{D)  = 


^p(p-i)exp(-^E/?)  rp 

crp(n)crp(p)  Ly 


This  result  is  the  complexification  of  Muirhead  [187]  corollary  3.2.19.  When 
(T^  =  1,  this  result  is  Khatri’s  equation  (7.1.7)  [137].  When  cr^  =  1  and  n  is 
replaced  by  - ,  this  is  Krishnaiah  and  Schuurmann  equation  (3.1)  [151]. 


Proof.  This  is  a  complexification  of  Muirhead’s  proof.  A“^  ~  ^  Ip  and 

= /„„,(■'''> = “p  (4  p) 

since  VV^  =  Ip.  Substitute  this  into  equation  E.7  to  obtain  the  result  where 

p 

detA  =  n  Note  that  the  Haar  measure  has  been  normalized  so  that 

i=l 

fvip){dV)  =  1.  □ 

We  build  slightly  on  this  intermediate  result. 


Corollary  22  Let  S  =  X^Ip  and  A  ~  CWp(n,A^/p).  Let  S  ~  and  let 
Ds  =  diag(/i,  •  •  • , /p)  be  the  eigenvalues  of  S.  Then  the  joint  density  of  the 
sample  eigenvalues  of  S  is  dF{Ds)  = 


Crp(n)Crp(p) 


(-JE'?)  [n 


IK'?  -  '?)■ 


(dDs) 
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This  is  the  complexification  of  Muirhead  corollary  9.4-2  [187],  which  was  stated 
without  proof. 


Proof.  S  ~  CWp(n,^/p)  by  lemma  16.  Substitute  ^  into  the 

previous  result. □ 

We  know  £'{5'}  =  X^Ip  by  theorem  52.  If  ~  CWp{n,  then 

S  ~  CWp{n,  ^Ip,  ^S)  and  E{S}  =  X^Ip  +  ^6.  The  expression  for  dF{Ds) 
would  be  considerably  messier. 

We  return  to  developing  the  central  density  function  of  the  sample  eigen¬ 
values  of  a  complex  Wishart  matrix.  The  term  we  must  evaluate  is 

I  elti-S-'U"  AU)(dU) 

JV(p) 

To  do  this,  we  draw  from  the  work  by  Gross  and  Richards  [96].  Observe 
that  and  A  are  both  nonsingular  Hermitian  matrices,  and  that  U(p)  is  a 
maximal  compact  subgroup  of  G'Z/(p,C). 

We  proceed: 


/  etr{-'E-^U"AU){dU)  =  f  exp[tr{-i:-'^U^  AU)]{dU) 
JlJ[p)  JV{p) 

/.  oo  1  OO  1  . 

•/U(p)  d.  d.  JlJ{p) 

where  Zm  is  the  zonal  polynomial  of  order  m  with  matrix  argument 


{-'E'^U'^AU) 


The  summation  over  |m|  =  d  means  that  the  sum  extends  over  all 


mi  -|- 1712  +  •  ■  •  =  d  <  oo 
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where 


mi  >  m%  >  •  •  •  >  0 


are  integers.  By  the  splitting  theorem  (proposition  41)  we  decompose  the 
integral. 


r  °°  1  f 

/  etr(-S-'C/''/l£/)(rf£/)  =  5:-  £  AU){dV) 

“'U(p)  rf=0  \m\=d 


oo  1 

<i=0  \m\=d 


Zmilp) 

Thus,  in  terms  of  zonal  polynomials,  dF{D)  = 


|detyir-^7rP('>-i) 

[detS]’*Crp(n)Crp(p) 


Ld=0  “•  |m|=<i 


Zmih) 


i<j 


(dD) 


By  definition  90,  this  becomes 
|det  Ar""  7rP(P-^) 


dF(D)  = 


[detS]'‘Crp(n)Crp(p) 


IK'?  -  '?)^ 

L><i 


(dD) 


This  is  the  form  as  given  in  James  [120]  equation  (95).  In  [119],  James 
stated  that  the  zonal  polynomials  for  the  case  of  real  positive  definite  sym¬ 
metric  matrices  are  the  same  for  the  complex  orthogonal  group  in  the  complex 
full  linear  group,  and  the  real  orthogonal  group  in  the  unitary  group.  A  con¬ 
tribution  of  this  thesis  is  applying  Gross  and  Richards’  work  [96]  which  is  valid 
for  Hermitian  positive  definite  matrices.  In  particular,  we  are  working  with 
the  unitary  group  in  the  complex  general  linear  group.  The  appearance  of  the 
expression  is  the  same.  Its  meaning  is  now  extended  to  include  our  signal  pro¬ 
cessing  problem.  This  is  the  complex  version  of  Muirhead  [187]  theorem  9.4.1. 
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Proof  of  this  result  completely  in  the  context  of  the  complex  Wishart  distribu¬ 
tion  is  one  of  the  major  contributions  of  this  research.  The  key  insights  were 
provided  by  Gross  and  Richards  [96).  Because  of  the  great  importance  of  that 
result  and  because  the  required  working  set  of  mathematics  involved  is  un¬ 
common  to  engineers,  their  paper  is  explained  with  commentary  in  appendix 
G. 

From  Gross  and  Richards  equation  (5.4.5)  we  know  that  for  Hermitian 
matrix  X  —  X^,  zonal  polynomials  have  the  property  that  the  value  of  Zm  at 
X  is  uniquely  determined  by  the  eigenvalues  of  X.  That  is,  ZmiX)  =  Zm(A^). 
We  can  equivalently  say  that  Zm{U^ XU)  =  Zm(X)  for  all  U  G  U(n).  Thus,  in 
our  problem,  we  know  Zm(A)  =  Zm(U)  by  using  Ui  and  Zm(—S)  =  Zm(A~^) 
by  using  f/j  where  is  the  matrix  of  eigenvalues  of  E.  From  this,  we  get  the 


form  dF{D)  = 


|detAr>’7rP(P-^)  ^  „  Zm{-A-^)ZAD)' 

[detEj»cr,(n)cr,(p) 


IK'?  - 


.'<j 


(dD) 

(E.8) 
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The  next  result  is  one  that  I  think  is  wrong.  In  the  unlikely  possibility  that 
it  is  right,  it  opens  up  new  practical  possibilities.  Recall  the  definition  that 
oFo(-A-2,£))  =  etr(-A-2D).  Then 


^  |detAn7rP(P-^) 

^  ^  [detS]"Crp(n)Crp(p) 


[etr(-A-2D)] 


-  ID 

L‘<i 


IdetAl”-"’  7rP(p-i) 


tsi 


n('?  -  >])'■ 

i<j 


(dD) 


{dD)  (E.IO) 
(E.ll) 


[detE]"crp(n)crp(p)  [ 

What  is  wrong  with  this  is,  that  in  general,  etr(— S“M)  ^  etr(— A'^Z)).  That 
gives  a  practical  immediate  answer.  Stopping  the  questioning  process  here, 
though,  eliminates  the  insights  we  need  to  find  out  why  fundamentally  the 
derivation  fails.  I  think  that  some  steps  of  the  derivation  perhaps  should  have 
been  only  one  way  implications  (=^)  rather  than  equalities  [('^)  U  (=>•)]. 


E.5  Distribution  of  F(2n,  2n) 

I  have  supplied  all  of  the  work  in  this  section. 

One  of  the  proposed  tests  to  determine  the  number  of  significant  eigen¬ 
values  of  a  complex  Wishart  matrix  involves  obtaining  two  independent  sets 
of  data  from  which  two  independent  complex  Wishart  matrices  are  formed. 
When  the  number  of  data  samples  used  in  forming  those  matrices  are  identi¬ 
cal  and  even,  then  a  simple  form  of  the  cumulative  distribution  function  for  the 
relevant  test  statistic  is  derivable  in  closed  form.  The  appropriate  test  statistic 
was  shown  in  theorem  6  to  be  distributed  according  to  the  F  distribution  with 


27?  1  and  27?2  degrees  of  freedom.  Corollary  1  specialized  the  result  to  apply 
specifically  to  comparing  linear  combinations  of  sample  eigenvalues. 


Attention  is  drawn  to  the  work  of  Lentner  [164]  who  developed  a  set  of 
expressions  for  arbitrary  positive  integer  degrees  of  freedom.  When  his  work 
is  restricted  to  the  special  case  considered  here,  the  results  can  be  shown  to 
be  identical.  The  contribution  of  the  following  work  is  that  there  is  a  simple 
form  when  the  condition  of  even  degrees  of  freedom  can  be  reasonably  met. 

The  problem  was  solved  by  direct  integration  of  the  probability  density 
function.  Reduction  of  the  resulting  expression  to  its  final  form  was  made 
possible  through  application  of  an  identity  from  combinatorics. 


Theorem  71  Let  n  be  an  even  positive  integer.  Then  the  cumulative  distri¬ 
bution  function  for  the  F{n,n)  distribution  is  given  by 


Pr{F  </}=(/  +  !) 


l-n 


n  — 


')/*  = 


n— 1 


f"''  A 


Discussion.  The  coefficients  of  /*  are  the  right  half  of  the  corresponding 
7?-row  of  Pascal’s  Triangle.  This  provides  an  efficient  means  for  testing  the 
significance  of  principal  components  in  the  small  sample  case. 

Proof.  Hogg  and  Craig  [109]  (p.  146)  define  the  cumulative  F  distribution 
in  terms  of  the  probability  density  function  as  follows. 

Pr{F  <  /}  =  /Mdw,  0  <  /  <  oo  (E.12) 

Jo 


where 


r(^)r(f)  (i 


j(rj2)-\ 


L  n 


(’•i+»-2)/2 


(E.13) 


This  is  the  probability  that  the  random  variabie  F  is  less  than  or  equal  to  the 
value  of  /. 

For  the  eigenvalue  test,  the  form  of  the  F  statistic  has  the  same  number  of 
degrees  of  freedom  in  the  numerator  and  the  denominator.  This  means  that 
rj  =  rj  =  n.  This  simplifies  equation  E.12  to  g{f)  =  ^(n)f(n,f)  where 

(n-iy. 

[r(n/2)p 


and 


h{n,f)  = 


y(n-2)/2 


(1+/)” 

The  resulting  integral  still  must  be  altered  to  ease  the  integration.  The 
strategy  is  to  take  advantage  of  the  property  of  g{J)  being  a  probability  density 
function.  This  allows  the  problem  to  be  cast  into  an  integral  whose  upper  limit 
is  infinite.  An  additional  change  of  variables  brings  the  lower  limit  to  zero, 
which  allows  the  application  of  complex  integration.  The  first  step,  then,  is 


Pr{F  <  /)  =  J  ^{n)h{n,(jj)du}  =  1  —  $(n)  J  h{n^u})du) 

Pr{F  </}  =  !-  $(n)A>,  /)  (E.14) 

where  K{n,  f)  is  the  integral  of  h{n,  /).  To  make  the  lower  limit  of  integration 
zero,  change  variables  to  let  a:  =  u;  —  /.  Then  u;  =  x  +  /  which  implies 
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div  =  dx.  The  limits  change  from  uj  €  [/,  oo)  to  x  G  [0,  oo).  With  this  change, 
the  integral  becomes 


KinJ)=  r 
Jq 


jx  + 

(1  +  x  +  /)« 


dx 


Let  be  a  complex  variable  and  let 


A(n,  z) 


(^  +  /)(n-2)/2 
(1  +  2  +  /)’^ 


Note  that  h{n,  z)  is  not  an  “even”  function.  To  make  the  integration  more 
tractable,  restrict  n  to  the  set  of  even  positive  integers.  That  is,  let  n  =  2m 
where  m  is  a  positive  integer.  This  gets  rid  of  evaluating  a  square  root  in  the 
numerator.  The  function  becomes 


/i(2m,  ^)  =  (z  +  /  +  l)-^’”(2  + /r-‘ 

Note  that  h{2m,z)  has  a  pole  of  order  2m  located  at  2o  =  (/  +  Be¬ 

cause  h[2m,  z)  is  not  an  “even”  function,  it  becomes  advantageous  to  use  the 
integration  technique  given  by  Hayek  [103]. 


E.5.1  Integration 


Consider  the  integral 


h{2m,  z)  log  z  dz 


The  path  of  complex  integration  is  given  in  figure  E.l. 
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Im  Let  z  =  R  exp(i0) 


Figure  E.l.  Integration  Contour  to  Get  Cumulative  Distribution  Function 
The  integral  is 


h-L 


=  2xz  RES  [h{2m,  z)  log  z,  Cq] 


2m 


2m 


'  C 4"  2  "J"  C'e + j 

=  2Tri  RES  |/i(2m,  2)log2,  (/  + 

Examine  the  integral  of  the  function  along  the  outer  circle  as  the  radius  is 
allowed  to  become  infinite. 

{Re'^  +  iy 


lim  /  =  lim 

R—kx>  JCo  /?— >00 


I  m— 1 


Cfi  H->oo 
pm-\ 


=  lim 

R—^oo 


/21og  R 


=  lim 

log/? 

R— »oo 

/?’" 

{i9  +  log  R)  iRe'^ 


=  0,  m  >  0 


(E.15) 


Recall  that  we  required  m  to  be  a  positive  integer  in  our  hypothesis,  so  the 


condition  on  m  is  satisfied. 
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Table  E.l.  Change  of  Variablef 


Line  z 

dz 

logz 

11  R 

dR 

logR 

L2 

dR 

z27r  +  log  R 

Examine  the  integral  of  the  function  along  the  inner  circle  as  its  radius  is 

allowed  to  vanish  to  zero. 

f  (ee‘®  +  /)”*  ^ 

lim  /  =  lim  — ^ - - — 5— +  logc)  lee*®  (E.16) 

.-oVc,  .-*0  +  /  +  if  ^  ^  ^ 


=  lim 


|(ce‘«  +  /  +  l)^”*' 
{i6  +  log  e)  e  =  lim 
,  .,1 


eloge 


(-e)  =0 


The  term  log  e  dominates  iO  as  e  goes  to  zero.  Invoking  L’Hopital’s  Rule  on 
doge  demonstrates  that  this  quantity  goes  to  zero  as  the  limit  is  applied. 
Note  that  there  are  no  poles  inside  Cf  The  integral  reduces  to 

/  =  /  =  27rt  RES  [A(2m,  2)  log2,  (/ +  l)e"j  (E.17) 

The  change  of  variables  for  evaluating  the  integrals  are  given  in  table  E.l.  The 
integrals  which  are  evaluated  are  given  in  table  E.2.  Substituting  these  into 
equation  E.17  yields 

Rearranging  to  obtain  the  integral  of  oriu  ,.  1  'r  u-rest  yields 


00  (p  4.  . 

^  .-dR  =  2n,nVS  :M2m,2)logz,  (/+  l)e' 

/  n  1  til  \2»n  I  I  /  o  T  / 
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Table  E.2.  Line  Integral  Evaluation 


Line 

/j[*h(2m,2)(logz)dz 

LI 

L2 

=  -  /o“  {iUlTf- (‘2*  +  '“S  fl)  dR 

£.5.2  Evaluation  of  Residue 


To  complete  the  evaluation  for  the  case  of  even  values  of  n,  the  residue  must 
be  evaluated  at  the  pole  located  at  zq  =  {f  1)6”^.  Recall  that  this  pole  is  of 
order  2m.  The  residue  is  evaluated  by 


RES  f/i(2m,  2)log2,  (/  +  l)e*" 

L  J  2r7i 


lim 


1 


z—f(/+l)exp(iir)  (2m  —  l)!d2^ 

1 


—  (2  —  (/  +  1  )e”^)  h(2m,  2)  log  2 


lim 


z-*(/+l)exp(t7r)  (2m  —  l)!d2^”* 

1 


ly 


lim 


-^(2  +  /+l)-— log2 

(pm-1 


*-*(/+!) ej'p(«>)  (2m  —  \)\dz'^'^ 


logz)  (E.19) 


To  evaluate  this  high  order  derivative,  recall  that 

dz*^  \dz )  dW^ 

Let  2  =  be  a  change  of  variables.  This  implies  dz  =  e'^dR,  =  e"’®*', 

and  ^  Let  6  =  ir.  Then 


—  =(-!)* 
d2*=  ^  ' 


d*' 


(E.20) 
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Substituting  equation  E.20  into  equation  E.19  yields 


RES  ^h{2m,z)\ogz,  (/  +  1)6”^! 

^m-1 


2m 


where  the  pole  is  being  approached  from  the  origin.  This  equals 

_1  ^m-l 

lim  7T - -  -  (/  -  (zV  +  log  R) 

ie-/+i  (2m  -  l)!dR2m-i  \  ^ 


=  lim 


-1 


R-f+i  (2m  -  1)! 


ZTT 


dR?^ 


(E.21) 


Several  other  identities  are  provided  below  to  aid  the  evaluation  of  the 
required  derivatives. 


=  +  k<m 

0,  k  >  m 

Equations  E.23  and  E.24  are  taken  from  Tuma  [268]  (p.  86). 

^  In  a:  =  x-''(-l )*'-'(*-!)! 
dx^ 


ife  I  ^  / 


-k) 


Substituting  these  derivatives  into  equation  E.21  yields 


-1 


fl-7+i  (2m-  l)!di?2" 


-(/-R)"-MogR 


(E.22) 


(E.23) 

(E.24) 


Notice  that  the  second  condition  of  equation  E.22  causes  the  first  derivative 
term  of  equation  E.21  to  go  to  zero,  and  the  second  derivative  summation  limit 


582 


to  be  m  —  1.  Therefore, 


S  fT ') 


u(*) 


X(-1) 


=  lim 


-1  (2m -1)!  (m-1)!  J2m  -  2  -  fc)! 

fl-7+i(2m-l)!  ;^oik!(2m-l-jfc)!(m-l-Jk)!^-^  ^ 

lim  .y  (2m-2-fc)!  (m-1)!  (/ -  i?)— 1-*= 


/+i  ^  ifc!(2m  -  1  -  lb)!  (m  -  1  -  it)! 

1  (m-1)!  (/-i?)”'-!-^ 


m— 1 


fl-7+i  ^  lb!(2m  -  1  -  lb)  (m  -  1  -  jfc)! 
Applying  the  limit  to  equation  E.25  on  R  yields 


(E.25) 


1 _ (m-1)!  (-l)»n-i-fe 

“  A:!(2m  —  1  —  A:)  (m  —  1  —  fc)!  (/  +  l)2Tn-i-fc 

y  (m-1)! _ (-1)"*-*^ 

-  1  -  lb)!  (2m  -  1  -  lb)(/  +  l)2m-i-fc 

/m  -  l\  (-1)’"-*' 

“Sv 

Substituting  this  into  equation  E.18  gives  us  the  integral  required  by  equation 
E.14. 


Pr{^  <  f}2m  =  1  -  $(2m) /\(2m,/) 


(2m-l-lb)(/+l)2"‘-i-* 


[(m  -  1)!]2  A:!(Tn  -  1  -  k)\  (2m  -  1  -  k){f  +  1) 


2m-l-k 


(E.26) 


E.5.3  Simplification 
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The  reduction  to  a  function  of  Pascal’s  Triangle  is  possible  through  cancelling 
factorials,  performing  a  binomial  expansion,  and  recognizing  a  combinatorial 
identity.  Working  on  equation  E.26,  we  get 

^  ^  (m  -  1)!  k\{m  -  1  -  ifc)!(2m  -  1  -  k)(f  +  l)2m-i-Ar 

(E.27) 

Pulling  out  the  (/  +  term  from  the  summation  and  reshuffling  factorials 

gives  us 

1  rj(2m-l)!l  1  ff  I  (F.2Si 

(/  +  1)2"-' s  (m-1)!  ’ 

Applying  the  binomial  expansion  to  (/  +  1)*'  yields 

1  _  1  V  (2m -1)!  1  ^  /fc\ 

(/  +  (m  -  1)!  k\{m  -  1  -  fc)!  (2m  -  1  -  k)  \j)^ 

_  1  (2m -1)!  *  k\p 

(/+  (m  -  1)!  fc!(m  -  1  -  A:)!(2m  -  1  -  k)  ^  j!(fc  -;)! 

1  (2m-l)!y  (-1^-*  ^  p 

+  (y  +  l)2m-i  (m-1)!  ^o{rn-l-ky.{2m-l-k)^^j\{k-j)\ 

(E.29) 

The  next  stage  of  simplification  requires  a  study  of  the  expansion  of  equa¬ 


tion  E.29.  The  goal  is  to  pull  the  term  /  out  of  the  inner-most  summation. 
Placing  individual  terms  of  equation  E.29  summation  into  table  E.3  allows 
recognition  of  a  pattern  that  permits  a  change  of  indices. 
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Using  the  observation  that  the  summation  may  be  reordered  allows  equa¬ 


tion  E.29  to  be  reformed  as 


Pr{F  </}  =  !  + 


1  (2m-l)!’;^V 


m— 1  fk  m—l—k 


^  L-  ^  _ V 

-1)!  A:!  “  ilfm-l -A:- ?■)! 


(/-fip-i  (m-1)!  U  JK 


jy.{m  +  j) 
(E.30) 


Examine  just  the  right  most  summation,  with  a  “  —  1”  factored  out. 


_ My _ 

1  (m-1  -  k)l  (-lym 

ym-l-A:)!  ^  j!(m  -  1  -  A:  -  j)!  (m -f  j) 


1  /  2m  —  1  —  fc 

m(m  —  1  —  A;)!\m  —  1  —  A: 


(m  +  j) 


(E.31) 


The  last  step  is  made  possible  by  an  identity  in  Riordan  [223]  (p.  47).  Sub¬ 
stituting  equation  E.31  back  into  equation  E.30  gives  us 

Pr(F<  n  1  ‘  (2m  -  1)!  ■y,' /*  1  (m-l-t)!m! 

(/  -I-  l)2n»-i  _  1)1  ^  jjifrn  —  1  —  A:)!  (2m  —  1  —  A:)! 

_  1  (2m  —  1)! m(m  — 1)! 

(/  -f  l)2n»-i  (jTi  —  1)!  A:!  m(2m  —  1  —  A;)! 

_  ,  1  y  (2m -1)!/*- 

(/+!)"’"-'  ito  -  1  -  k}! 

=  1 _ i 

=  — - —  [(/ + 1)^”"*  -  V  P”*  ~  /*' 

—  ^  ffe  _  ~  fk 

k  1^  to\  fc  )  \ 


1  /2m  - 

I  .^0  \  fc 
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which  we  obtained  by  the  binomial  expansion  of  (/  +  1)^"*“*.  Thus 


Pt{F  <  f]  - - - -  V  /* 


Recall  that  n  =  2m  for  positive  integer  m.  Then  we  obtain  the  final  result 
that 


f<{F  <  /)  = 


(/  +  !)■ 


n—l 


A 


(E.32) 


The  coeflFtcients  of  f'‘  come  from  the  right  half  of  Pascal’s  Triangle.  Recall 
that  Pascal’s  Triangle  takes  on  the  form  given  in  table  E.4.  For  example, 
when  n  =  8  we  get 

Pr{f  <  /),  =  [35/<  +  21  f  +  7f  +  /'] 

=  (jirr  +  7)  /  +  211  /  +  35)  /'} 


E.6  Ordered  Versus  Unordered  Eigenvalues 

I  supplied  this  section. 

When  you  study  basic  probability,  an  early  exercise  examines  counting 
rules,  permutations,  and  combinations.  Olkin  and  Derman  (p.  67,  Counting 
Rule  5.3)[200]  tells  us  that  the  number  of  distinguishable  arrangements  of  n 
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items,  Til  of  which  are  of  type  1,  nj  of  which  are  of  type  2, . . . ,  of  which  are 
of  type  k,  is  given  by 

_ 

^  ni!n2!*--nfc! 


When  you  look  at  the  density  function  of  a  vector  random  variable 


z  = 


/ 

22 


you  are  including  the  specification  of  the  order,  an  n-tuple  (01,22,- ••  ,z„). 
This  is  treated  differently  than  (22, 21,  •  •  • .  2n).  For  our  complex  vector  normal 
distribution,  interchanging  vector  elements  would  yield  a  different  covariance 
matrix. 

We  are  interested  in  looking  at  an  ordered  set  of  eigenvalues.  When  we 
decompose  our  sample  covariance  matrix,  we  get  something  like 


W  =  UL^U" 
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=  E  llUkU^  =  w 

k=l 

As  you  know,  order  of  summation  is  unimportant  (i.e.,  commutative).  As 
long  as  you  interchange  positions  of  the  eigenvectors  to  the  same  ordering  as 
the  eigenvalues,  the  sum  of  the  products  will  remain  the  same.  Let  <t  be  our 
permutation  of  the  index  set  (1,2,  •  •  • ,  k).  Then 

j=l  k=l 

When  If  ^  If  for  all  pairs  of  *  and  j,  then  there  are  pi  orderings  of  the 
paired  eigenvalues  and  eigenvectors.  Given  that  the  density  function  for  the 
ordered  set  of  eigenvalues  (/J ,  If,--- ,  If)  is  then  the  density  function  for 

the  unordered  set  of  eigenvalues  ,  ^l{p))  is  pl  9{L^)- 


■( 


l\u, 


tpUp 


,  u»  , 


Appendix  F 
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RESULTS  FOR  SIGNAL  PROCESSING 

This  appendix  provides  results  either  directly  related  to  known  signal  process¬ 
ing  needs  or  are  one  step  removed  from  those  needs.  Many  of  these  results  are 
a  complexification  of  Arnold’s  section  17.7  [31].  It  includes  studies  on  forms 
involving  the  matrix  complex  normal  distribution,  the  complex  Wishart  distri¬ 
bution,  and  functions  on  these  forms.  Most  of  the  statistical  work  done  in  this 
thesis  bears  directly  on  making  a  path  to  the  answer  to  the  thesis  question. 
With  just  a  small  amount  of  extra  work,  it  was  possible  to  produce  results  of 
value  to  other  portions  of  the  acoustic  signal  processing  community.  Many  of 
those  results  are  presented  in  this  appendix.  These  forms  include  the  trace, 
determinant,  inverse,  and  some  selected  ratios.  For  completeness’  sake,  at  the 
end  of  the  chapter,  some  beamforming  results  by  Tague  will  be  presented  to 
show  the  usefulness  of  these  methods. 

This  appendix  builds  on  itself  as  it  goes  along.  Although  you  can  get  a 
highlight  of  results  by  looking  at  the  theorem  statements,  the  way  to  find 
out  how  they  live  is  to  start  from  the  beginning. 

F.l  Trace  Distributions 

Theorem  72  Lei  Z  ~  CAfm,r(/t,E,  S)  and  let  T  be  an  m  x  r  complex  matrix. 


591 


Then 

iv{T”z)  ~  civ,  (tr(r"/t),tr(r"Ers)) 

This  is  a  complexification  of  Amold^s  theorem  17.13(a)  [31],  which  was  stated 
without  proof. 

Proof.  From  equation  D.9,  recall  the  characteristic  function  for  Z. 

^z{T)  =  exp  i  Re  (trfT^p])  -  ^  trfr^ETS]] 

Let  u  =  tr(r^Z).  Then 

$u(t)  =  £{exp(i  Re(t^u)]} 
where  <  is  a  scalar  and  therefore  commutes.  Then 

$„(t)  =  £{exp[iRe(t^tr[T’^Z])]} 

=  5{exp[iRe(tr[(rt)^Z])]}  =  ^z(tT)  =  ^,r(T^z)(t) 

since  u  =  tr(T^Z). 

Alternately,  by  theorem  19 

^tr(TffZ)  =  ^THz(f^r)  =  ^z{TtIr)  =  ^z{tT) 

with  the  last  step  justified  by  theorem  18. 

Let  r  =  tT.  Then  ^z{tT)  =  Note  that  t  is  a  matrix  of  the  same 

dimensions  as  T.  The  transform  variable  t  is  a  scalar. 


^zir)  =  exp  i  Re  (trfr^p])  —  ^  tr[r^ErS] 


592 


=  exp  i  Re  (tT[{tT)^ fi]j  —  ^  tr[(<r)^E(^T)S]j 
=  exp  i  Re  (t^  tr[r^//]^  —  ^  tr[T'^Er!]]j 

This  is  the  characteristic  function  of  a  scalar  complex  normal  random  variable 
with  mean  tr(r^/x)  and  variance  tr(r^HTS).  Therefore, 

tr(T"Z)  ~  CNi  (tr[r"/i],tr[r"ETE]) 

□ 

Lemma  21  Let  Z  ~  CNm.rifJ'i  I ,  !)■  Then 

~tr[^"z]~xLr 

and  when  =  1  this  is 

This  is  a  special  case  of  the  complexification  of  A  mold  ^s  theorem  17.13(b)  [31] 
(which  was  stated  without  proof),  and  will  be  used  in  the  proof  of  the  more 
general  case. 

Proof.  Let  Z  =  p  =  {pi])  where  the  Z,j  are  independent  and  Z.y  ~ 

CN\{pij,l).  Then  Z  ~  CNm,r{p-,Ii3)  and  Z^ Z  ~  CWrim,  I,S)  where  S  = 
p^ p.  Consider  Z^^ Z  directly. 
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where  Zmxr  =  (^ii  •  ■  •  >  Zr)  and  Z,  = 


.=1  t=l  t=i  j=i 

^  Zmi  y 

Note  that  this  is  the  same  answer  as  you  get  when  you  vectorize  Z.  Let 


Z=  : 


Then  tr(Z^Z)  =  Z^Z.  Similarly,  tr{fi^fi)  =  when  /i  is  vectorized  in  the 
same  way.  Then  Z^rxi  ~  CNmr{fi,I),  where  cr^  =  1.  By  theorem  53  and 


lemma  15  we  know 


Z"Z~C^k',(m^,<T^^) 
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where 

S  =  fi^fi  =  tr(/i"/i) 

Therefore, 

^  tT{Z^Z)  ~  xlmr  tr(/x"/i)j  =  xLr  (2  tr(/i^/x)) 

1  0-1  rn 

wiiere  a'  =  i,  lj 

Theorem  73  Let  Z  ~  E),  Hermitian  positive  definite  H  an<f 

E.  Then 

2tr  [(Z  -  fifE-'{Z  -  m)S-']  ~  :,L,12tr(,i''H-V£-')l 

This  is  a  complexification  of  the  first  part  of  Arnold’s  theorem  17. 13(b)  [31], 
which  was  stated  without  proof  The  real  case  for  my  result  differs  from 
Arnold’s  result. 

Proof.  Let  5  =  AA^  and  E  =  which  we  know  by  theorem  119. 

Then 

tr  [(Z  -  pfE-^iZ  -  /i)E-‘]  =  tr  [(Z  -  pf(AA”)-\Z  -  p){B^B)-^] 

=  tr  [(Z  -  pfA-f^A-^Z  -  =  tr  [^-"(Z  -  pfA-^A'^iZ  -  p)B-^] 

=  tr  [[A-\Z  -  p)B-'\^[A-\Z  -  p)B-^])  =  tr(Z"Z) 
where  Z  =  .4“*(Z  — /i)J5“*.  By  theorem  41,  Z  ~  CNm,r{p,^,  S),  which  implies 

^  ~  /^)mxr5rxr 
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~  =  CiV„,.(A- V^"', /, /) 

Then  by  lemma  21  we  know 

2tr(Z''Z)  ~  xL,  [2lr  {(A-VB-')"  (^‘VB-’)}] 

=  xLr  [2‘r  {(^”VB'')  (A-VB-')*|] 

=  xL,  (2tr{A-Vfl-'B-'',<''A-''}]  =  xLr  |2tr{^(B«B)-V''(AA'')-'}] 
=  xLr  [2tr{»iS-V''H“'}]  =  xLr  [2tr{^''S-VB"'}] 

Therefore 

2tr  [(2  -  /.)''E->(Z  -  (*)£-']  ~  xL,  [2tr  {/<«=-V2-'}] 
which  concludes  the  proof.  □ 

Proposition  40  Let  Zmxr  ~  S),  with  Hermitian  positive  definite 

•^mxm  nnd  S^xr-  Let 

I 

— _i 

^  C21  C22 

and 

Sii  S12 
S21  S22 

Define  the  trace  of  a  rectangular  matrix  to  be  the  trace  of  that  matrix  when 
made  square  by  augmentation  with  an  appropriately  chosen  zero  matrix.  Then 

tr(Z^^)  =  tr(^^)  =  tr(E-^Z) 


r  Xm 


mxr 
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CNi  (tr  (T^H-V)  ,tr  (C„S))  =  CiVi[tr(E-V),tr(CiiS)]  for  m>r 

CNi  (tr  (r^E'V)  ,tr(E-iSn))  =  CAri[tr(E-V),tr(E-iS„)]  for  m  <  r 
Note  that  the  argument  of  the  trace  function  here  is  not  a  square  matrix.  The 
trace  function  is  usually  defined  only  for  square  matrices.  Finally, 

2tr(Z»E-'ZE-')  ~  xL,  [2tr(,,''E-VS-')] 


This  is  a  complexification  and  extension  of  errata  to  Arnold’s  theorem  17.13(b) 
[31],  which  was  stated  without  proof. 


Proof.  First  note  that  E  =  E^.  Since  Z  ~  CNm,T{fi,  H,  S),  then  by  theorem 
41  we  know  E“^Z  ~  S).  Consider  tT(T^E~^ Z)  where  T  is 

rxr 

m  X  r.  We  consider  two  cases,  based  on  the  comparison  of  m  and  r. 

Ir 

0(m-r)x 


First,  let  m>  r.  Define  T  = 


.  Then 


T^E~^Z  = 


F  Orx(m— r) 


E-‘Z  = 


Z-^Z  0 


Also, 

T"E-*r  =  I  /  0  /  1 

*  ^rx(m-r) 

Therefore  by  theorem  41 


/ 

\ 

■ 

/  \ 

T 

Cu 

C12 

Ir 

Cu 

=  (A.0) 

C21 

C22  ! 

9(m— r)xr 

<  J 

=  C, 


11 


r"E-‘Z  ~  CiV,.,(r"E-‘/i,Cn,S) 
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Notice  that  tr  is  the  sum  of  the  elements  down  the  main  diagonal 

(identified  by  x  in  the  matrix  below)  of  the  m  x  r  rectangular  array  E“’Z. 

X  y  y  y  0  0 

y  X  y  y  0  0 

y  y  X  y  0  0 

y  y  y  X  0  0 

y  y  y  y  0  0 

y  y  y  y  0  0 

Let  us  define  this  as  tr[E~^Z].  Now,  consider  tr(r^E“^TS)  =  tr(CiiS).  From 
our  study  on  characteristic  functions,  recall  from  theorem  19  that  = 

Now,  by  equation  D.9, 

^T»z-iz(T)  =  exp  jiRe  [tr  (r^r^E~V)]  ~  ^ 

Now,  let  T  =  tl  to  obtain  the  characteristic  function  of  the  trace  of  T^3~^Z. 
We  get 

^tr(T«E->z)(0  =  expjzRe  [tr  (r/^T^E'V)]  ~  ^ 

=  exp  Re  [tr  (t*r^E~V)] 

=  exp  Re  [rtr  (r^E"'/‘)]  -  ^  |<|^  tr  (CiiS)| 

This  is  the  characteristic  function  of  the  distribution 


CA^,  (tr  (T"E-V),tr(C'„S)) 
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By  our  definition  for  the  trace  of  a  rectangular  matrix,  we  can  call  this 


CiNri(tr(E-V),tr(CnS)) 


Now,  let  r  >  m.  Define  T  = 


m  Ontx(r— m) 


.  Then 


T^E'^Z  = 

1 - 

E-^Z  = 

E-’Z 

0(r— m)Xm 

1 - 

O 

1 _ _ 

We  further  note  that 


9(r— m)xin 


ZT-l 


Im  9mx(r— m) 


- 1 

_ 1 

[1] 

1 

O 

11 

“  *  Onix(r— m) 

0(r— m)xTn 

O 

O 

By  theorem  41  we  know 


T”E-^Z  ~  CNr,r 

( 

T^E-^fx, 

O 

7 

[1] 

_ 

\ 

,s 

1 

o 

o 

/ 

We  also  observe  that 


T^E~^TE  = 


/  \ 

(  ,  'l 

/  \ 

E-‘  0 

E"^  0 

Ell  Ei2 

= 

E  = 

<  0  o> 

.  0  0, 

^  E21  E22  ^ 

r,  ti\l  —  iil2 


0 


0 
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From  this  we  see  that  tr(r^E“^TE)  =  tr(H“^Sii).  Once  again  examining  the 
characteristic  function,  we  now  change  only  the  definition  of  T  and  we  observe 

=  exp  Re  [tr  (r^r^E  'V)]  ~  ^ 


1  i  Re 

tr 

/ 

E-V 

\ 

1 

fs-' 

\ 

0 

tE 

1 

1 

0 

y 

^  « 

by  equation  D.9.  To  obtain  the  characteristic  function  of  tr(r^E  ^S)  we  once 
again  let  r  =  tl.  This  gives  us 


( 

“ 

/ 

r  1 

\ 

“ 

“  “ 

1 

I 

[I] 

1  ,  .  2 

E-'En  E-IE12 

^TH-=-iz{tI)  =  exp  <  i  Re 

tr 

t* 

--liitr 

1 

V 

o 

0  0 

J 

f 

■ 

/ 

• 

■ 

1 

E'V 

1  .0  r  •  1  1 

=  exp  <  i  Re 

tr 

f 

--|(ptr  E->E„ 

I 

\ 

0 

J 

^  1 

f 

1  ~  1  ) 

\ 

This  is  the  characteristic  function  of  CA^i 

tr 

,tr(E-iE„) 

V 

.  0  y 

/ 

You  can  observe  that  this  theorem  can  be  generalized  easily  by  application 
of  the  Singular  Value  Decomposition. 

Now  we  want  the  distribution  of  tr  (Z^E“^ZS“*  Recall  that  Z  ~ 
Let  E  =  AA^  and  E  =  We  do  this  by  theorem  119. 

Then 


tr  (Z^E-^ZE-*)  =  tr  {z^ {AA^)'^ Z{B^ B)''^)  =  tr  (Z^A'^A'^ZB-^B-'^) 
=  tr  (B-^Z"A-^A-^ZB-^)  =  tr  ((^-‘Z5->)"(A-'Z^-'))  =  tr(r"F) 
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where  Y  =  A~^ZB~^.  Note  that 

=  c7v„,,(a-Vb-',/,/) 

by  theorem  41.  By  lemma  21  we  know 

21t(Y''Y)  =  2tr  (Z^E'-ZE-)  ~  xLr  (2tr  (^"E-VE-')] 

Theorem  74  Let  W  ~  CVI^p(n,  E,  6)  with  Hermitian  nonnegative  definite  S. 
Then  2tr(E“^Vl^)  ~  X2np[2tr(S“^6)].  This  is  a  complexification  of  Arnold’s 
theorem  17.14  f^iji  which  was  stated  without  proof,  and  it  also  generalizes  the 
complex  version  of  the  distributional  result  of  Muirhead’s  theorem  3.2.20  [187]. 

Proof.  Because  S  is  nonnegative  definite,  there  exists  a  decomposition 
S  =  B^  B  by  theorem  119.  From  the  definition  of  W,  let  W  =  Z  where 
Z  ~  CN„^p{fi,  I,  S).  5  is  p  X  p,  Z  is  n  X  p.  Then 

tr(E-‘lF)  =  tr[(B^5)-*(Z"Z)]  =  ix{B-^  B-^  Z) 

=  tr(B-"Z"ZB-^)  =  tr((ZB-^)"(ZB-^)) 

Let  Y  =  ZB~^.  The  lew  variable  F  is  n  x  p.  Then 

y  ~  CiV„,p(pB-‘,/,B-"EB-*)  =  CNn,r,{fiB-\I,I) 
by  theorem  41.  We  note  that 


tr  [(pB-*)"(pB-*)j  =  tr(5-'^p"p5->)  =  tr(5-*F-"p"p)  =  tr(S-M) 


where  6  =  and  S  =  (B^B)  *.  Then 
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2tr(y"y)  =  2tr(E-'lT)  ~  xLp[2tr(E-^6)] 

by  theorem  21. 

Lemma  22  Let  W  ~  CWp{n^Y,,6)  where  S  has  the  eigenvalue  decomposition 
E  =  UAW^.  Then 

2tr{A-VWU)  ~  xLpl2tr(A-'t/"W)] 

Proof.  Note  that  =  U^T,U.  By  the  definition  of  the  complex  Wishart 
distribution,  let  W  =  Z^Z  where  Z  ~  CiV„,p(^,  /,  S).  By  theorem  54  we  know 

U^WU  ~  CWp{n,A'^,U^6U) 

By  theorem  74,  we  get  the  result 

2tT{A-^U^WU)  ~  xLp[2tr(A-='C/"«C/)] 

□ 


F.2  Characteristic  Function  of  the  Complex 
Wishart  Distribution 

Theorem  75  Let  W  ~  CWpin.E),  i:  >  0.  Let  W  =  2W  -  A(W^)  where 
A(iy)  is  a  diagonal  matrix  consisting  of  the  elements  on  the  main  diagonal  of 
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W.  Then  W  has  the  characteristic  function 

*»(r)  =  (det(/,-isr)]-" 

where  =  T  6  and  W  has  the  joint  distribution  of  the  random  variables 

(Wii,  W,2,  ■■■,  2Wm2,  2W/,2,  •  • . ,  2W^/(p_i),p) 

This  is  Goodman  equation  1.7  [92],  Eaton  proposition  8.3(iii),  (p.  305)  [74], 
and  the  complexification  of  Arnold’s  theorem  17.15  [31]. 

Proof.  As  a  preamble,  note  explicitly  that  we  are  not  obtaining  the  char¬ 
acteristic  function  of  W.  This  is  not  the  characteristic  function  of  the  joint 
distribution  of  the  random  variables 

(Wit,  • . . ,  Wni2,  W^/,2,  •  •  • ,  VPfl(p-i).p,  W^/(p-i),p) 

This  tradition  was  also  honored  by  other  authors  in  deriving  transforms  for 
the  real  Wishart  distribution.  The  characteristic  function  for  W  is  useful 
for  studying  some  transformation  of  variables,  but  great  attention  to  detail 
is  necessary  if  it  is  to  be  useful  for  computing  expected  values  of  moments. 
This  is  because  does  not  exist  when  i  ^  j.  This  is  a  result  of  imposing 

the  condition  T  =  T^  which  is  used  to  justify  the  existence  of  an  eigenvalue 
decomposition  in  equation  F.l. 

This  is  a  complexification  and  expansion  of  Eaton’s  proof.  Let  Z  ~ 
CiVp(0,/)  and  C  e  Cp’^p.  Then 


X  =  CZr.  CAp(0,CC")  =  CAp(0,S) 
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where  E  =  CC^  by  theorem  119.  Let  {.Yj}”  be  a  random  sample  of  size 
n  >  p.  Then 

i=i 

where  Xj  is  a  column  vector.  We  want  to  find  the  characteristic  function  for 
W.  Note  that  W  =  >  0.  Thus,  let  the  argument  of  the  characteristic 

function  have  the  property  T  =  >  0.  This  will  make  the  answer  come  out 

in  a  nice  form. 

It  turns  out  that  we  are  not  deriving  the  characteristic  function  of  W,  but 
rather  we  are  deriving  the  characteristic  function  of  a  related  matrix  variable 
which  I  will  call  W.  The  transformation  matrix  T  in  my  proof  is  called  A  in 
Eaton’s  proof. 

=  £{exp[iRe{iT{T"\V))]}  =  5{exp[t  Re(tr(rW))]} 

since  —  T. 

=  £:{exp[tRe(tr(rf:X,Xf))])=5{nexp[iRe{tr(T.Y,Xf))]} 

j=i  j=i 

=  n£(expliMlr(TX,.Y«))l) 

j=i 

since  the  {Xj}  are  independent.  Further,  since  the  {Xj}  are  identically  dis¬ 
tributed,  for  your  favorite  j  we  can  say  this  equals 

|f(exp:iMtr(rA:iYf))l)]”  =  |£{exp(>  R*(tr(JK-"r.Vj))|)l" 


[£{expliM.V''rX,)l))" 
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because  we  recognize  that  XfTXj  is  a  scalar  and  thus  equal  to  its  trace.  We 
drop  the  subscript  j  and  use  the  lower  case  x  to  signify  our  generic  independent 
identically  distributed  vector.  Let  x  =  Cz  and  B  =  C^TC.  Then 

=  [£{exp[i  Re(2^C'^rC2)]}j  =  [£{exp[i  Re(2^B2)]}j 
Note  that  >  0,  so  by  theorem  115  it  has  an  eigenvalue  decomposition 

B  =  (F.l) 


Thus 

=  [^^{exp[iRe(2^rA^r"2)]}]"  =  [£:{exp[i  Re(y"A^y)]}]” 
where  y  =  r^2.  Note  that 

y  =  r"z  ~  CArp(o,r"r)  =  ciVp(o,/) 

This  is  the  same  distribution  that  2  has.  Thus,  we  can  write 

=  [^{exp[t  Re(2"A^2)]}j"  =  £:{exp[z  Re(  kfc|^)]} 

k=i 

where  {2^)1  are  the  elements  of  complex  vector  2,  and  are  independently 
distributed  as  CA^i(0, 1).  Therefore, 

^w{T)  = 

.*=1 

since  the  A^  and  \zk\^  are  real-valued.  We  continue  by  expressing  the  expected 


value  in  its  integral  form. 

^iv(^)  =  {n  /  exp[iAjl2fcp]^exp[- l2itlV-<=}" 

*=1  '* 
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~ =  *■  "’’II  (fr^) 

=,-v  n  (i  -  =  \'i‘i  {I  -  >A^)i‘” 

k=l 

where  =  diag(Aj,  •  •  •,  Ap)  and  det(/  —  iA^)  ^  0.  Since  =  B  >  0,  all  its 
eigenvalues  are  real  by  corollary  34.  'I'herefore  Xf  —i  for  any  value  of  k,  so 
the  determinant  always  exists. 

I  have  lost  the  pedigree  of  the  proof  that  A]  cannot  be  pure  imaginary. 
However,  it  is  important,  so  it  is  presented.  Suppose  there  exist  some  Xj,  XI  € 
R  such  that  (1  —  ?Ap(l  —  iXl)  =  0.  Then 

1-!(AJ  +  AJ)-A?A|=0 

which  implies 

1  -  A'f  A^ 

,  —  - LJL 

H 

This  is  impossible,  .so  there  can  never  be  such  Xj,Xl. 

Continuing, 

=  lt/e<(/p)]""  [det  {l,  -  =  [dc<(IT")j""  {dct  {l,  -  /A^)]'" 

=  [dc<(r)r"  [del  (/p  -  /A^)]'"  [dc<(r")]'" 

=  [del  (rr"  -  H'A^r^')]""  =  [det{i^  -  m)]-’' 

since  B  =  FA^P'^  We  also  recall  that  B  =  which  gives  us 

1^^{T)=  [del  (/p  -  iC"TC)]~''  =  [f/(Y  (/p  -  rC('"T)Y'' 
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by  Lemma  47.  Then 

$n,(r)=Me/(/p-isr)]-’* 

where  E  =  CC^.  □ 

F.3  Functions  of  a  Wishart  Matrix 

Example  6  S{W]  =  nS.  This  result  is  known  by  many  people  for  the  case 
of  the  real  Wishart  matrix.  The  point  of  this  example  is  the  use  of  the  char¬ 
acteristic  function  of  W  to  compute  the  expected  value  of  W.  It  is  not  quite 
the  trivial  exercise  one  might  expect  from  experience  with  univariate  statistics. 
Blame  this  example  on  me. 

Proof.  Recall  that  the  characteristic  function  corresponding  to  the  joint 
density  of 

{Wn ,  1^22,  •  •  • ,  M/pp,  2WRn,  2Wn2,  •  *  • ,  2VP/(p_i),p) 

is  given  by 

«*(?)  =  [del  (/.-iET))-” 

where  T  =  T^.  From  the  properties  of  characteristic  functions,  we  recall  for 
the  differential  operator 


that 
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€{W}  =  £{2W  -  A(M/)}  =  n(2S  -  A(E)] 
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Note  that  A(iy)  =  A(l^),  thus  e{A(W)}  =  £{A(W)}.  Then 

5{A(VF)}  =  n[2A(S)  -  A(S)]  =  nA(S)  =  £{A(VT)} 

Therefore  £{IV}  =  raS. 

Theorem  76  Let  W  ~  CVTp(n,  S)  and  let  a  6  6e  a  fixed  vector  of  complex 
numbers.  Then  the  characteristic  function  of  the  quadratic  a^Wa  is  given  by 
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Applying  Goodman  equation  1.7  [92]  for  ^^(T)  =  [det(/p  —  iHT)]  "  we  get 

^a«vVa(<)  =  [<iet(/p  -  iT.aa^t)\ 

where  <  €  C  is  a  scalar.  Treating  Ea  as  a  p  x  1  matrix  and  as  a  1  x  p 
matrix  in  Eaton  lemma  1.35,  we  get 

=  [det(l  -io^Sot)] 

□ 


Theorem  77  Let  W  ~  CW^p(n,  E),  E  >  0.  If  n>p,  then 


det(H^) 

det(E) 


det(E-’H^) 


has  the  same  distribution  as  fl  where  the  Ui  are  independent  and  2Ui  ^ 

>=i 

X2(n-i+i)-  ®  complexification  of  Arnold’s  theorem  17.15(b)  [31],  which 

was  stated  without  proof.  Goodman  [93]  gives  an  alternative  proof.  It  is  also 
a  complexification  of  theorem  7.5.3  of  Anderson  [26]. 


Proof.  The  proof  presented  here  follows  the  hints  given  in  problem  17.13(a) 
of  Arnold  [31]  applied  as  to  the  complex  Wishart  case.  Let  E~*  =  CC^ .  This 
exists  by  theorem  121.  Then 

det(E-'VP)  =  det(CC"W^)  =  Aei{C^WC) 

Since  W  ~  CWp{n,  E),  then  by  theorem  54  we  know 

C^WC  ~  CWpin,C"EC)  =  CWp{n,C^C-^C-^C) 
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Thus,  C^WC  ~  CWp{n,Ip).  Let  V  =  C^WC.  Note  that  the  partitioned  form 
Vii  0 

By  lemma  46, 


of  y  is  y  = 


V 


0  Vi 


22 


dety  =  (detyaa)  (det(yn  -  yi2y22' V^i))  =  det(y22)  det(yn) 


By  lemma  19,  with  being  det(2),  we  know  that  with  =  1, 


21/,  =  2(V'„  I  Vk)  ~  X2(„-,+.)(0) 


By  theorem  55,  V22  ~  CWp^i(n,  Ip^i). 

Now,  partition  V22  in  the  same  manner  that  y  was  partitioned.  Then 


2U2  =  2(V22  1  Kaa)  ~  Xi(„-,«)(0) 

and  y33  ~  CVyp_2(n, /p-a).  repeating  this  process  through  the  p"*  entry,  we 
observe  that  2Pdet(y)  =  H  (2t^i)  where  2Ui  ~  X2(n-p+i)(0)-  reversing  the 
index,  we  get  2(7,  ~  X2(n-.+i)(0)’  since 

{2Ui ,  •  ■  •  ,  2f/p}  ~  {X2(n-p+l)5  X2(n-p+2)’  ‘  ‘  ’  X2n} 


Note  that  this  theorem  says  that  the  distribution  of  det(W)  can  be  con¬ 
sidered  as 


det(W)~2--|detE)nxi,„-rti) 

1=1 


and 


2Pdet(M/) 

det(S) 


=  2'del(E-'H')~nxi(„-rt.) 

J=1 


□ 
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Theorem  78  .  Let  W  ~  ClVp(n,  S),  S  >  0.  Ifn>p,  then 

£{det(H^)}  =p!Q  det(S) 

This  is  a  complexification  of  Arnold’s  theorem  17.15(c)(i)  [31],  which  was 
stated  without  proof. 


Proof. 


P 

By  theorem  77,  det(S"*W^)  has  the  same  distribution  as  fl  Ui  where  the  f/, 

»=i 

are  independent  and  2Ui  ~  X2(n-«+i)*  By  property  of  the  distribution, 
S{2Ui}  =  2(n  —  i  -h  1).  Thus,  S{1/,}  =  n  —  i  +  1.  Continuing, 


«=i  i=i 

since  the  Ui  are  independent.  So,  we  have 


£{det(S-*l^)}  =  f/i}  =  n(n  -  1)  •  •  •  (n  -  p  +  1) 

t=i 


which  implies 


£:{det(iy)}  =  det(E)£:{det(S'*W^)}  =  n(n  -  1)  •  •  •  (n  -  p  +  l)det(S) 


□ 
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Theorem  79  IfW^  ClVp(n,  S),  X)  >  0  and  n>  p,  then 

f  {[det  =  [detE]^  Crp(n  +  Jfc)/Crp(n) 

£{ldet»'n  =  [detEl^b>!l' (”;■)(;) 

var{det  IT)  =  (det  EJ*  Ip'r  (;)  („4  J 

This  is  a  complexification  and  generalization  of  Anderson’s  lemma  (p.  264) 
[26]. 


Proof.  This  is  a  complexification  and  generalization  of  Anderson’s  proof. 

P 

By  theorem  77,  det(E“*W^)  has  the  same  distribution  as  f]  where  the  Ui 

t=i 

are  independent  and 

2C7,  ~  X2(n-j+l) 


From  Patil  et  al.  (p.  35)  [204],  we  know  that  if  x  ~  Xm> 


5{xn 


2*’r(lm  +  k) 


Thus 


which  implies 


f{(2W,)‘} 


2*'r(n  +  Ar  —  i  +  1) 
r(n  —  i  +  1) 


r(n  4-  A:  —  I  +  1) 

r(n  —  t  4  1) 


Since  the  Ui  are  independent,  then 


£ 


17  4  A:  -  i  4  1) 

r=i  r(n-i4l) 
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which  implies 


f{[det  =  [detS]*n 

t=i 


r(n  +  A:  —  t  +  1) 

r(n-i  +  ir 


=  [det  E]* 


Crp(n  +  A;)7rP(p-')/2 
ffP{p-i)/2  Crp(n) 


=  [det  E]* 


Crp(n  +  fc) 
crp(n) 


In  the  special  case  of  Ar  =  1,  then 


^rr ,  ,  ,.rii  rj  x  ^(n  +  l)r(n)  •  •  •  r(n  +  2  -  p)  _  f 
f {(det  W])  =  [del E)  r(„)r(„-|)...r(n  +  l-p)  - 

This  is  the  same  answer  we  got  in  theorem  78. 

When  k  =  2, 


r(n  +  l) 

r(Ti  + 1  -  p) 


^{[detW]'}  =  [detSf 


r(n  +  2)r(n-H)---r(n  +  3-p) 
r(n)r(n-  l)---r(n  +  1  -  p) 


=  [det  E]^ 


r(n  +  2)r(n  +  l) 
r(n  +  2  -  p)r(n  +  1  -  p) 


=  [det  E]^ 


(n  +  l)!n! 


(n  +  1  -  p)!(n  -  p)! 


Therefore, 

i'{ldetH'|=)  =  |detE|^|pf["^')(”) 


The  variance  of  det  W  is 


var{det  W}  =  5{[det  W]*}  —  [£^{[det  W]}]^ 


=  |detE)=|pf 


|detE]=lp!)= 


P 

n  +  1  -  p 


) 


□ 
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Theorem  80  Let  W  ~  CVrp(n,S)  where  S  has  eigenvalue  decomposition  E  = 
FA^r^.  Let  W  =  2W  —  A(W^)  and  A(1F)  is  a  diagonal  matrix  whose  diagonal 
entries  are  the  elements  on  the  main  diagonal  of  W.  Then  the  characteristic 
function  of  tr  W  is 

K»(‘)  =  n  -  ■')■" = Kt(/,  -  isH)]"" 

k=l 

This  is  similar  to  equation  (5.58)  of  Goodman  [92] 

Proof.  This  proof  is  essentially  due  to  Goodman  (p.  169)[92], 

^trvv(0  =  ^wUpt)  =  ^wiT) 

where  t  is  a  scalar.  By  Goodman  equation  1.7,  ^^{T)  =  [det(/p  —  t'ET’)]”" 
where  T  e  Then 

-  [<iet(4  “  *S/pt)]""  =  [det(/p  - 
Using  the  eigenvalue  decomposition,  we  get 

-  irA^r^t)]'”  =  [det(rr"  -  iTA^r^t)]"" 

=  [det  F]-"  [det(/p  -  fA^f)]'"  [det  F^] 

=  [detFF^]"”  [det(/p  -  iA^t)j~”  =  [det(/p  -  fA^'t)]"" 

Since  a  common  use  of  a  characteristic  function  involves  setting  t  =  0,  we  with 
to  preserve  evidence  of  dependence  on  A^.  So, 

«.,»(<)  =  (det  A"]‘”  [det(A-=  -  =  n  -  *0'” 

fc=l 
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□ 

Note  that  since  =  T  =  Ipt  where  t  6  C,  we  know  t  G  R.  Therefore, 
mKwW  exists. 

Note  that  tr  ly  is  a  function  of  only  those  elements  on  the  diagonal; 

Consequently,  we  can  work  with  the  characteristic  function  of  the  joint  distri¬ 
bution  of 

(fVii)  •  •  • ,  2W^/ji2,  •  *  • ,  2VF/(p_i),p) 

and  still  get  an  answer  to  the  question  being  asked  about  tr  W. 

Theorem  81  Let  W  ~  Cli'p(n,E).  Then  S{trW}  =  ntrS.  This  is  the  com- 
plexification  of  theorem  17.15(e)  of  Arnold  [31], 

Proof.  This  proof  is  an  application  of  the  concepts  developed  in  section 
B.4.  By  theorem  80,  the  characteristic  function  of  tr  W  is  given  by 

where  is  the  diagonal  matrix  of  eigenvalues  of  S  and  t  €  R.  Taking  the 
derivative,  we  obtain  the  following. 

^  [det(/p  —  =  —n  [det(/p  —  tA*<)j  ^  ^  ^  ^  det(/p  —  lA^t) 

^  det(/p  -  i\H) = ^ 
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We  apply  the  chain  rule. 


d 


Putting  the  problem  all  together, 


—  det(/p  —  iAH)  =  D-w?)  n(i  - 


1=1 


k=l 

k^kl 


dt 


[det(/p  -  =  in  [det(/p  -  ^  ^  ^  ^  A;  jj(l  -  iXlt) 


p  p 


1=1  k=l 
ki^i 


We  evaluate  at  t  =  0,  and  we  obtain 


^  [det(/p  -  iAH)] 


=  in  ^  Xj  =  in  tr  A^  =  m  tr  S 


1=1 


t  =  0 


Recall  that 


£{ivW}  = 


=  (— i)  I  n  tr  E  =  n  tr  S 


t  =  0 


Theorem  82  Let  W  ~  CWp(n,  S).  Then 

^{(tr  W)^}  =  n^(tr  S)^  +  n(tr  S*) 
and  var(trVP)  =  n(trE^). 

Proof.  This  proof  is  an  application  of  concepts  developed  in  section  B.4. 
By  theorem  80,  the  characteristic  function  of  tr  W  is 


^tr»v(0  =  [det(/p  -  iA^O] 
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where  A*  is  the  diagonal  matrix  of  eigenvalues  of  S.  The  first  derivative  with 
respect  to  t  is 

^  [det(/p  -  ”  =  in  [det(/p  -  lA^t)]  ^ 

dt  ^  /=i  fc=i 

kltl 

Apply  the  chain  rule  for  the  second  derivative. 
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dt 


<=i  jt=i 
k^i 


p  p 


—  X]  ^  —  “*  +  tr  E) 


l=l  m=l  /=1 

m// 


t  =  0 


=  -*  E  tr  ^)  =  *  tr(S")  -  i(tr  S)2 

/=i  ' 

Assembling  all  the  parts,  we  get 

[det(/p  -  =  ^m[i(n  + l)(trS)]^  A,^j  +  (m[i]  [tr(S^)  -  (trS)^]) 

=  — n(n  +  l)(tr  S)^  —  n  tr(E^)  +  n(trE)^  =  — n^(tr  S)^  —  ntr(S^) 


dP 


Then 


=  n^(tr  E)^  +  n  tr(E^) 


«{(trtV)^)  =  „«■(<) 

I 

i  =  0 

The  variance  of  tr  W  is  obtained  from 


var(trW^)  =  ^{(triy)^}  -  [5{triy})= 


From  theorem  81  we  have  [5{trVF}]^  =  n^(trE)^.  Thus 


var(tr  W)  =  n^(tr  E)^  +  n  tr(E^)  —  n^(tr  E)^  =  n  tr(E^) 


□ 


Theorem  83  Let  A  ~  ClTp(n,  E).  Let  A  =  T^T  where  T  is  upper  triangular 
with  positive  real  values  on  the  diagonal.  Then  the  probability  density  of  T  is 

f(T)  =  TT  s2{n-k)+\ 

’  [detErcrp(n)  /Ji 
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This  is  Goodman  equation  5.51  [92]. 

Proof.  The  density  for  A  is  given  by 

,,,  [det>ir-^etr(-S-M),^,, 

=  |<letSl"-C-r;(n)— 

The  Jacobian  for  the  change  of  variables  from  >1  to  T  is  given  by  Goodman 
equation  5.25  [92]  and  by  theorem  27  as 

J(A  T)  =  2P  n 
*=1 


Performing  the  change  of  variables  gives  us 

f{T)  = 


[det(r"r)]""%tf(-E-iT«r) 

[det  Sj^CPpC^) 


2P  JJ  tlt'^^\dT) 

k=\ 


Note  that 


[det(T^T)]”'"  =  [det(T)]"<”-'’^  =  fl 

k=l 


-P) 


The  final  result  is  by  observing  that 


2(n  —  p)  +  2{p  —  A:)  +  1  =  2(n  —  A)  +  1 


□ 


Theorem  84  Let  W  ~  CW^p{n,  S)  where  W  = 


/  \ 
W21  W22 


and  E  = 


Sii  S12 


S21  E22 


,  Let  V  =  Wii  —  Wi2^22^^2i  Ell. 2  =  Sii  —  E12E22  E21, 
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where  Wu  and  Sn  are  q  x  q.  Then  V  ~  CWg{n  —  p  +  9,Sn.2)  o,nd  V  is  in¬ 
dependent  of  Wx2  and  H^22-  Also,  W22  ~  CH^p_,(n,  E22),  and  {Wu  |  ^^22)  ~ 
CA^,,(p_,)(Ei2Ej2*M^22i  Sn.2, 1^22)-  This  is  a  complexification  of  Muirhead’s  the¬ 
orem  3.2.10  [187]. 

Proof.  This  is  a  complexification  and  expansion  of  Muirhead’s  proof.  Let 
B  =  E-^^2^(E-‘/2)H.  Note  that  W  =  VP",  E  ^  S",  where  E'/^  is  the 
positive  definite  square  root  of  E.  E  =  E’/^(E'/^)"  by  theorem  119.  Perform 
the  following  change  of  variables.  Let  V  =  VPn  —  VPi2VP2'2' W^2i,  S12  =  VP121 
B22  =  VP22-  Recall  that  B21  --  VP2i  =  ^1^12-  Thus, 

(dW)  =  {dWxi)  f\{dWx2)  MdW22)  =  {dV)  ;\{dBx2)  f\{dB22) 

Recall  that 

det  VP  =  (det  VP22)  (det(VPii  —  VP12VP2V  ^21 )]  =  (det  B22)  det  V 


and 


det  E  =  (det  E22)  det  En.2 


Let 


C  =  E-*  = 


where  Cn  \s  q  x  q.  Then 
tr(E-'VP)  =  tr 


/ 

c„ 

Ci2 

\ 

C21 

C22 

C\\  C\2 

C21  C22 


V  +  B12B22  B2X  B\2 
B2\  B22 
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=  tr(C'iiV'  +  C11B12B22  B21  +  C12B21)  -h  tr(C2iBi2  +  C22B22) 

=  tT{CiiV)  +  +  tr(C’i2^2l)  +  tr(C2lfil2)  +  tr(C22522) 

Observe  that 

tr[Cii(Bl2  +  Ci^Cl2B22)B22  {Bi2  +  C^^Ci2B22)^] 

+  tr[522(C'22  ■“  C'2iC'|j*Ci2)]  +  tr[Cn  V] 

=  tT[CnBi2B22  {B\2  +  C^^C\2B22)^]  +  tr[Ci2(5l2  +  Cn*C'l2^22)^] 

+  tr[jB22(C'22  ~  C'2iCi/C'i2)]  +  tr[Cii 

=  tr[CnBr2B22'B(!2]  +  tT[CrrB,2B22^BgCliC;,”]  +  trfCu^"] 

+  tr[Ci2£f22^C'^C'ij^]  +  tr[522C22]  “  tr[522C'2iCjj'Ci2]  +  tr[CiiV] 

Recall  that  B22  =  B21  =  C21  =  Cfi'  =  Cn^ .  The  expansion 

continues  as 

tr[Cii  B]2B22*B2i]  +  trie'll  Bi2C'2iC,/]  +  tr[Ci2B2]] 

+  tr[Ci2B22e2iCi'j*]  +  tr[B22C'22]  —  tr[522e2iCji*Ci2]  +  tr[C'iiK] 

Recall  that  tT{ABC)  =  tr{CAB).  This  allows  us  to  produce  the  expansion 

tr[Cii  Bi2BJ2^  B21]  +  tr[Bi2C2i]  +  tr[Ci2B2i]  +  tr[Ci2B22C2ie,j*] 

+  tr[B22C'22]  —  tr[(7i2B22e2ie]/]  +  tr[CiiV''] 

=  trie'll  BuB^j' B21]  +  tr[Bi2(’2i)  +  tr[(7i2B2i]  +  tr[B22e22]  +  trlCnV] 
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=  tr[Cn  V]  +  iT[CnB\2B22  ^2i]  +  tr[Ci2B2i]  +  tr[C'2iBi2]  +  tr[C22B22] 

This  is  the  same  expression  we  had  for  tr[S“^lT].  From  the  partitioned  matrix 
inverse,  we  know  that 

Cii  =  Sj/2  “  (Sii  ~  ^12222*^21) 

C22  —  C21C11C12  =  ^22 

C-r/Cn  =  -S,2S2-2' 

To  see  this,  look  at  the  inverse  from  both  directions.  From 

^l/.2  ~^i/^12S22J 

"^22*  Si2Si]*2  ^22*1 

We  observe  C\\  =  E{'i*2  S'Od  C22  ~  C2\C]iC\2 

_  V“1  V“*V'  V“J  V  V“lV  v~i 

—  ^22.1  "22  “12‘^n.2"ll-2‘^ll  ‘^12^22.1 

~  ^22*  (^22  ~  S21  Ej  j*  Si2)S22*  j  =  E22' 

From 

C'11',2  ~C\\C\2C22.\ 

—C22C2iCh\  ^22.1 

we  observe 

-E,2E2-2‘  =  C,-‘C,2C2-2',C22.1  =  Cf/Cn 

Recall  that  the  complex  Wishart  density  function  is  given  by 


_  ldetWr'’etr[-E-*Wl 
[detE]"Crp(70 
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Note  that  (det  Wl  —  det  W  since  W  is  positive  definite.  Substituting  in  our 
results,  we  get 


f{W)  = 


_ |(det.B22)fdetV)r-P _ 

[(det  E22)  (det  E11.2)]"  7rP(p-i)/2  n  r(n  -  i  +  1) 

»=i 


X  etr[— CiiV  —  822(022  —  C'2iC'i/C'i2) 


— Cii(Bi2  +  01^012822)822(812  +  Oil  012822)^] 
(det  etr  [-^ii^V) 

(det  Eii.a)""'’'^"  x9(9-i)/2  n  r(n  -  p  +  9  -  i  +  1) 

t=i 

(det  522)”'"+’ etr  (-E22‘522) 

X - - - - 

(det  E22)"  T(p~9)(p-9-m^ll  r(n-i  +  l) 

«=i 

etr  [-Sr/.2(5,2  -  ^12^^^822)822^812  -  Si2S2-2  ^22)^] 
^  7r(p-9)9(detEn.2f''(detB22)’ 

Note  that  the  exponents  of  x  obey 


^q(q  -  1)  +  ^(p  -  9)(p  -q-l)  +  (p-q)q  =  ^q(q  -  1)  +  ^(p  -  q)(p  +  <?-!) 
=  ^9(9  -  1)  +  ^(P  -  9)(9  -  1)  +  ^(P  -  9)P 

=  ^p(q  - 1)  +  ^p(p  -  9)  =  ^p(p  - 1) 


Also  note  that 


]i[r(n  -p  +  9-i+  1) 
L<=1 


■p-q 


nr(n-i  +  i) 

L»=i 


r(n  -p  +  9)r(n  -p  +  9-  !)•  •  •  r(n  - p  +  l)r(n)r(n  -  1)  ■  •  ■  r(n  -p  +  9+  1) 
=  r(n)r(n  —  l)---r(n  —  p+l)  =  nr(n  —  i  +  1) 


i=l 
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Thus,  f{W)  is  the  product  of  three  density  functions.  The  first  one  is  the 
density  function  for  V.  It  is  distributed  CWg{n  —  p  +  9,  Sn.2)-  The  second 
term  is  the  density  function  for  B22  =  W22-  It  is  distributed  CW^p_,(n, S22). 
The  last  term  is  the  conditional  density  of  B\2  =  W12,  given  that  B22  =  IV22 
is  fixed.  It  is  distributed 

CiV,.(p_,)(E,2S2-2'l^22,  E1I.2,  W22) 

In  conclusion,  V  is  independent  of  VF22  and  {Wu  1  14^22))  and  therefore  is 
independent  of  14^12.  □ 

Corollary  23  Let  W  ~  CVFp(n,  S)  and  let  X  =  W22  —  14^21 44^n^  14^12  and 
S22.1  =  S22  —  E2iSn  S12  where  W22  and  E22  are  (p  —  q)  x  (p  —  q)-  Then 
X  ~  CVKp_,(n  —  9,  S22.1)  and  X  is  independent  ofW2i  and  Wu.  Also,  Wu  ~ 
CVK,(n,Sii)  and  (14^21  I  W^n)  ~  CiV(p_,)  |j(S2iSii*14^ii,S22.i,  44^ii)-  This  is  a 
corollary  to  a  complexification  of  Muirhead’s  theorem  3.2.10  [187]. 

Proof.  This  follows  the  general  logic  of  Muirhead’s  proof  of  theorem  3.2.10, 
modified  by  the  different  partition  of  interest.  Let  B  = 

We  perform  the  change  of  variables  X  =  VP22  —  W2iVPj^^VPi2,  B12  =  W12, 
Bn  =  Wn-  Recall  B21  =  IV21  =  Then 

idW)  -  idWn) /\{dWn) /\{dW22)  =  {dBn) /\{dBu)  MdX) 

Note  that 


det  W  =  (detW„)det(W22  -  WjiWf^^Wu) 
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by  lemma  45.  thus 


and 


det  W  =  (det  Wn )  det  X 


det  S  =  (det  S22)  det  £22.1 


Let 


C  =  = 


Cn  C\2 

C21  C22 


where  Cn  is  q  x  q.  Then 


tr(E-W)  =  tr 


VL 


Cn  C\2 

C21  C22 


ni 


>12 


B21  X  +  B2iB{i  Bi2 


=  tr(Ciij5ii  +  ^^12-^21)  tr(C72i5i2  +  C22^  C22B2iB^-^  B12) 


C  =  = 


Cn  C12 


^  C21  C22  f 


11.2 


V 


—  £22^  ^21^11*2 
C22  =  ^22.1 


-£1/ £i2£22^.1 


y-1 

‘^22.1 


Cn  ~  C12C22C21  —  Sjj*2  —  (  — Sjj^£i2£22'.i)^22.i(~S22^^2iSj]^2) 
“  ^11*.2  ~  £12^22*  S21S1/.2 


=  (7  —  £j/£i2£22^S2i)£n.2  —  —  ^12^22*  £21)^11*2  —  £11* 


( 


£  =  C"'  = 

Note  that 


11 

£12 

^21 

£22 

\ 


/ 


^11.2 

—  C22*C'2iC'h!2 


-c'r,'c,2C2-2^  ^ 

^*22!!  y 


—  £2i£]i*  —  C'22*^2lC'n!2  “■  ^22^^21 
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Then 

tr  |c22  (^21  +  C'22*^21^1i)  ^11  (-^21  +  C'22^^21-Sll)  j 

+  tr  (Cii  —  Ci2C'22'^2i)]  +  it  {€22^) 

=  tr  ^C'2252i-6n^  (^21  +  ^22*^21-811^  j  +  tr  ^C2i  (821  +  (^22^^21811^ 

+  tr  ^811  ^Cii  —  Ci2C'22'C'2i)]  +  tr  (C22-V) 

=  tr  [<?2282i8[-/82"]  +  tr  [C'2282i8i-'8"C2"C2-2^] 

+  tr  [^2182^]  +  tr  [6’2i8/(C'2"<^2~2"] 

+  tr  [811(711]  -  tr  [8ii(7i2C2Y6’2i]  +  tr(C22^) 

Recall  that  811  =  8^,  812  =  8|{,  C12  =  C^,  C22  =  .  We  use  this  to 

simplify  the  notion  to 

tr  [02282181-/82^]  +  tr  [(72282i(7i2(72-/]  +  tr  [021812]  +  tr  [O21811O12O2-2'] 

+  tr  [811  Oil]  ~  tr  [8ii0i20^^02i|  +  tr  [022^] 

=  tr  [0228218]^* 8^ I  +  tr  [821O12]  +  tr  [O21812]  +  tr  [02i8ii0i2022'j 
+  tr  [811O11]  —  tr  |8ii0i2022*02i|  +  tr  [022^] 

=  tr  [0228218^/8^]  +  tr  [821O12]  +  tr  [O21812]  +  tr  [811O11]  +  tr  [022^] 

=  tr  [O22X]  +  tr  [02282181-/82^]  +  tr  [O21812]  +  tr  [O12821]  +  tr  [O11811] 
tr  ^  =  tr  [O11811]  +  tr  [O12821]  +  tr  [O21812] 

+  tr  [022>^]  +  tr  [02282181/8^] 
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C22  —  ^22.t  ~  (^22  ~  S2iSjj^Ei2^ 


-1 


Cn  —  C12C22C21  —  Ejj^ 


^22^^21  —  ~S2iEjj* 


We  recognize  some  of  the  pieces  as  follows. 


IdetWl"-"  =  KdetBii)(detX)r 


-p 


and 

exp  tr 

=  exp  {-  tr  [C22X]  -  tr  [fin  (C^^  -  Ci2C22^C2i)] 

—  tr  j^C22  (B21  +  C^*C2iBn)  {B2\  +  C'^^C2iBti^  j  | 

[det  S]"  =  [(det  Sn)  (det  S22.i)r 

Crp(n)  =  ^  r(„  _  i  ^  1) 

t=i 

We  expect 


CWp(n,  E)  =  CWp_,(n-q,  E22.i)-CW,(n,  E„)-CiV(p_,),,(S2iSri^  Wn,  E22.1,  Wn) 


We  look  at  the  density  functions. 


/[CWp(n,E)l 


IdetWI^-^etr  [-E-W] 

(det  E)"  Crp(n) 


/[CWp_,(n-<?,E22.i)l 


ldetXr-‘'-’’+’etr 

(detE22.ir’’Crp_,(n-<?) 


/[CW,(n,E„)l 


|det  BiiT’etr  [-Sr/ Bn 
(det  Ell  rcr,(n) 
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etr  [-  (S22.1)-'  (fiai  -  (B21  -  EjiSr/l^n)"] 

"  7r«(p-^)is22.ini^iin 

To  show  the  claimed  product  is  true,  we  observe 


[det  S22.1]”'’  [det  Sn]"  [S22.1]’  =  [det  S]" 


|detXr-P  Idet^nl" 


-9 


|det  Bn 


p-q 


=  ldetXr"P  IdetBiir-P  =  IdetH^l 


n-p 


CTp.q{n  -  <?)Cr,(n)x’<P-«) 


=  [7r(P-«Hp-<J-i)/2]  J]  r(n  -  9  -  i  +  1)| 

ts=l 


9 

n 

1=1 


nr(n-i  +  l) 


TT 


9(p-9) 


Exponents  of  n  are 


1  12 

-(p  -  q){p  -q-l)  +  -q{q  -  1)  +  -q{p  ~  q) 

=  ^(P  -  9)(P  -  9  -  1  +  2?)  +  ^q{q  -  1)  =  ^[(p  -  q){p  +  9  -  1)  +  9(9  "  1)1 
=  ^ b(p  -  9)  +  (p  -  9)(9  -  1)  +  9(9  -  1)1  =  ^b(p  -  9)  +  p(9  -  1)]  =  ^p(p  -  1) 

The  product  of  the  Gamma  functions  is 


nr(n-9-i  +  l) 

L.=l 


nr(n  -i  +  1) 


U=i 


IJ  r(n  -  g-i  +  9+  1) 

•=9+1 


nr(n-i  +  i) 


Lt=l 


=  jlr(n-i-l-l) 

«=1 
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Theorem  85  Let  W  ~  CVl^p(n,S), E  >  0,  and  let  A  =  W  *  and  G  =  E 
Then  the  density  function  of  A  is 


A^CJWp(n,G) 


Idet  etT{-GA-^)  [det  G]” 

CTpin) 


(dA) 


This  is  the  Complex  Inverted  Wishart  Distribution.  This  is  a  complex  gen¬ 
eralization  of  Mardia  et  al.  equation  3.8.2  [171],  and  also  a  complexification 
of  theorem  7.7.1  of  Anderson  [26].  The  real  variables  case  is  also  reported  in 
Siskind  [247]. 


Proof.  This  is  a  complexification  of  Anderson’s  proof.  Recall  that  the 
density  for  the  complex  Wishart  distribution  is  given  by 


fwim  = 


|dettrr^etr(-S-»VP) 

7rP(p-i)/2  [det  E]"  n  r(R  -  z  +  1) 
1=1 


The  Jacobian  for  the  complex  change  of  variables  W  =  A  ^  where  = 
VP  >  0  is  given  in  theorem  40  to  be  J{W  —y  A)  —  |det  A\~^^ .  Thus  />i(A)  = 


f^(A-^)J{W  ^  A). 

,  ,  ,,  Idet  r" etr  {-GA-^ )  jdet  Ar'" 

Ja{^)  p 

[det  G-i]’*  7rP(P-i)/2  n  r(n  -  f  +  1) 

•=1 

Note  that  —n-\-p  —  2p  =  —n  —  p=  — (n  +  p). 

,  ,  ,,  [det  G]"  Idet  Ar(”+'’>  etr  (-GA-‘)  lAr^^+^UtrC-GA'*)  [detG]” 

JAi^}  “  p  pp 

7rP(p-i)/2  n  r(n  -  i  +  1 ) 

«=i 

□ 
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Theorem  86  Let  W  ~  CW'p(n,E),E  >  0,  n  >  p.  Then 


p!(";')  IdetS] 

This  is  a  complexification  of  Arnold’s  theorem  17.15(c)(ii)  [31],  which  was 
stated  without  proof. 

Proof.  This  proof  was  motivated  by  Mardia  et  al.  [171]  (p.  487)  equation 
B.3.6.  First,  recall  for  the  Xm  distribution  that  if  ar  ~  Xm 


£{x*}=2 


r(f) 


When  fc  =  —  1  then 


1  r(f-i)  r(f-i)  1 

2r(f)  2r(f -i)r(f -i)  2(f-i) 


This  implies 

By  theorem  78,  we  know  has  the  same  distribution  as  nui 

where  Ui  are  independent  and  2Ui  ~  xlin-i+iy  Thus,  has  the  same 

p 

distribution  as  H  ^  where  2t/<  ~  X2(n-«+i)- 

5  /_^_l  = _ I _ = _ \ - 

l2f/,J  2(n-i  +  l)-2  2(n-i) 

implies  Therefore, 

£{lde<(EH^-)|}=ni;r^ 
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(n  -  l)(n  -  2)  •  •  •  (n  -  p)  [det  E]  p\  ("-i  j  [det  S] 

□ 

Theorem  87  Let  W  ~  CM^p(n, S),S  >  0,n  >  p.  Then 

This  is  a  complexification  of  Arnold’s  theorem  17.15(d)  [31],  which  was  stated 

without  proof. 

Proof.  Let  V  ~  CWp{n,S})  where  =  diag(Af,  •  •  • ,  Ap).  Let  a  be  a 
column  vector  with  zeros  in  every  position  except  for  a  1  in  position  i.  Let  V" 
be  the  element  in  position  (i,i)  of  Then  by  theorem  64, 

„a"A-^a  (a)  2  , 

^  ya  Xfyii  A:2(n-p+l)l^l 


Then 


= - \ - = - L_ 

12  ’  /  2(n-p+l)-2  2(n-, 


2(n  — p+1)  — 2  2(n— p) 

which  implies  €{V*'}=  Thus  S{V~^)=  where  n  >  p. 

Let  E  =  PA^r",  W  =  TVT^.  By  theorem  54, 

W  ~  CVPpKPA^r")  =  CVPp(n,E) 


Then 


r£{v-^}r"  =  £:{rv-‘r"}  =  5{(rKr")-'}  = 


1  .py-ipw  ^ 


n  —  p 


n  —  p 


where  n  >  p. 


Note:  e{\W-^{}  ^  |£: {VV'-*}! .  In  fact, 
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1^  K'}|  =  =  (p- ("_  !)^(K-'|} 

□ 

Theorem  88  Let  W  ~  CH^p(n,  S),  S  >  0.  Then  5{triy}  =  ntrT,.  If  n  >  p 
then  £{triy~^}  =  ^^trll~‘.  This  is  a  complexification  of  Arnold’s  theorem 
17.15(e)  [31],  which  was  stated  without  proof. 

Proof.  By  theorem  52,  S{W)  =  nS.  The  trace  function  is  merely  a  linear 
combination  of  elements  on  the  diagonal  of  a  matrix.  Expectation  is  a  linear 
operator.  Therefore 

£{tr  W^}  =  tr  S{W}  =  tr[nE]  =  ntrS 

By  theorem  87,  if  n  >  p  then  S{W~^}  =  Therefore, 

€{trW-^}  =  -^trS-* 
n  ~  p 

a 

F.4  Tague  and  Styan  Properties 

This  section  is  included  to  demonstrate  the  usefulness  of  the  statistical  theory 
developed  during  this  thesis  research.  This  is  all  work  by  Tague  [264],  slightly 
reordered  in  places  and  with  the  derivation  of  some  constants  expanded.  It  is 


also  included  to  collect  work  in  the  literature  into  a  unified  presentation. 
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Theorem  89  Let  W  ~  CWp{k,I),  U  €  U(p),  and  A  €  Then 
g{A)  =  €{etT{AW)}  =  £{ett(AU^WU}  =  g{UAU^) 
This  is  from  Tague  [264 J- 


Proof.  By  definition  of  an  expected  value,  we  define  g{A)  as  follows. 
g{A)  =  I  etriAW)fwiW){dW)  =  j  etr(/lVP) 

Now  consider  what  happens  under  unitary  similarity  transformation. 


g{UAU^)=  I  GtriUAU”W)fw{W){dW) 

Jw>o 

=  f  eiT{AU^WU)fw{W){dW) 

Jw>0 

by  property  of  the  trace  function.  Now,  perform  a  change  of  variables  Y  = 
U^WU, which,  has  the  inverse  relation  W  =  UYU^ .  By  corollary  7,  the  Jaco¬ 
bian  of  this  transformation  is  1.  Thus 


„  f  \dei{UYU»)\  \ir{-UYU^) 

g(UAV«)  =  X^„etr(^F)i - 

f  .  ^  ^^^Jdet(C/)det(r)det(f/")|''~'’etr(-f/"f/r) 
Jy>o  ^  ^ 


(dY) 


CTp{k) 

We  note  that  U^U  =  /  because  U  €  U(p)  and  also  |det((/)  det(f/^)|  =  1. 
Thus  we  have 

giVAU”)  =  /  =  g(A) 

JV  >0  n(A') 


□ 
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Theorem  90  Let  W  ~  CWp{k,Yl),T,  >  0,  with  deterministic  matrix  A  G 
Then  S{WAW}  =  k^TiAT.  +  A:tr(/IE)S.  This  result  was  proven  for 
the  complex  case  by  Tague  [264],  motivated  by  Styan’s  treatment  [262]  of  the 
problem  for  the  real  case.  The  result  is  not  a  simple  extension  of  the  real  case. 


Proof.  Let  W  ~  CWp{k,  I).  Recall  from  lemma  58  that  for  random  W  G 
Cp^p  and  fixed  T  G  Cp’^p  that,  using  a  moment  generating  function  argument, 

E{WijWtm}  =  biSij6l,n  +  b26imSjl 

1,  j  =  k 


where  Sjk  is  the  delta  function  Sjk  =  ^ 
lemma  25 


and  b\ ,  62  are  constants.  By 


0,  j  7^  k 


i=l f=l 


and  thus 


p  p 


[j=l (=1  J  j=l  /=1 


p  p 


=  E  E  ^i‘  +  b2Sim6jl)  —  Aimbl  +  ^>2^im  E 

j=l 1=1  j=l 


=  biAim  +  ^«m^2tr(A) 


Then  for  the  whole  matrix. 


e{WAW}  =  hiA  +  b2l  tr{A) 


where  W  ~  CWp{k,  I). 
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We  need  to  evaluate  b\  and  62.  Suppose  we  let  A  =  e.ef^  where  e,  is  the 
standard  basis  vector,  which  is  a  zero  vector  with  a  1  added  to  the  element. 
Then 


S{WAW]  =  +  62/ 


Further, 


el^£{WAW}ei  =  £{e^Weie^We.}  =  6,  +  62  =  £{W^i] 


By  corollary  17,  Wu  ~  CWi{k,Ii).  By  theorem  53,  then  21F„  ~  xj/dO).  From 
properties  of  the  Chi-square  distribution,  theorem  142,  we  know  = 

k(/c  -1-1).  Recall  that  it  is  (2VF„)  that  has  the  Y2;t(0)  distribution,  not  Wa. 
Substituting  into  our  earlier  result, 

S{W^^}  ==  b^  +  b,  =  +  k 

Now  consider  ofT-diagonai  elements  of  W AW.  Let  A  =  where  i  /  j. 
Since  11  =  /,  then  theorem  56  tells  us  that  <he  set  of  {VF„}  are  independent 
random  variables.  Thus 

cl'e{WAW}c,  =  =  k-k^  P 


and  also 


(i'£{WAW}<j  =  +  ,  =  b,  +bi\T{,,,'/)  =  fc, 


Therefore  hi  =  k^  which  impli<'s  k^  -f-  h^  =  k^  k  which  give^  ns  bj  -  k.  Tlien 


=  kW  -I-  ki\T(A) 


(F.2) 


where  W  ~  CWp{k^  I). 

We  now  want  to  consider  the  case  when  W  ~  CWp{k,  S)  for  =  E  >  0. 
By  theorem  119,  the  decomposition  S  =  GG»  exists.  G'^  and  exist 

since  S  >  0.  By  theorem  54, 

(5-1 WG-"  -  CWpik,G-^^G-^)  =  CWp{k,I) 

Applying  equation  F.2, 

£{G~^WG~^  BG~^WG~^]  =  k'^B  +  kItT{B) 

for  B  €  Then 

G€{G-^WG-»BG-^WG-^}G»  =  £{WG-^BG-^W] 

=  k^GBG^  +  kGIG^  tr(B)  =  k'^GBG^  +  kGG^  tr(B) 

Let  A  =  G-^BG-K  Then  B  =  G^  AG,  and 

£{WAW}  =  k^GG^AGG^  +  kGG^  iT{G^ AG)  =  fc^SAS  +  k^  tr(AS) 

which  is  the  main  result.  □ 

Corollary  24  Let  W  ~  CIV,(»:,S).  Then  e{Wt}  =  ^  tSlr(E).  This 

simple  special  case  was  first  prodaceel  for  the  real  variables  case  by  Styan.  and 
then  rederived  for  the  complex  case  by  lague. 


Proof.  Let  A  =  /  in  theorem  90.  □ 


Corollary  25  Let  W  ~  CWj,{k,E)  and  a  6  C^.  Then  S{Waa"W}  = 
PEaa^E  +  ifca^EaE. 


Proof.  Let  A  =  aa^  in  theorem  90.  Note  that  tr(aa^)  =  tra^Ea)  =  a'^Ea 
is  a  scalar.  □ 

Corollary  26  Let  W  ~  CWp{k,Y,)  and  a  G  C^.  then  var(VPa)  =  fca^EoE. 

Proof.  £{Wa}  =  £{W}a  =  kT,a  by  theorem  52.  By  definition, 

var(VPa)  =  £{{Wa){Waf}  -  £{Wa}e{{Waf) 

=  e{Waa^W}  -  k'^T.aa^I.  =  ib^Eaa^E  +  ita^EaE  -  fc^Saa^E 
from  corollary  25.  The  final  result  is 

var(VPa)  =  Ara^EaE 

□ 

Theorem  91  Let  =  V  ~  C/M/p(A:,  /),  U  G  U(n),  and  A  G  Cp’'”.  Then 
g{A)  =  £{etr(AV')}  =  €{etT{A(J^VU)}  =  g{UAU^).  This  property  was  taken 
from  Tague  [264]  permission. 


Proof.  This  is  analogous  to  Tague’s  proof  for  the  case  of  Z  ~  CWp(A:,  7). 
We  define  g{A)  as  follows. 


^(A)  =  f  etr(AV0 
Jv>o 


jdetV|-<*+P>etr(-V-') 

CTpik) 


(dV) 
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by  the  definition  of  an  expected  value,  where  the  density  function  comes  from 
theorem  85  for  the  complex  inverted  Wishart  distribution.  Now  we  consider 
what  happens  to  g  as  we  use  U AU^  as  the  argument.  Following  the  definition, 
then 


g{UAU”)  =  /  eix{U  AU^V)fv{V){dV)  =  f  eir{AU^VU)fv{V){dV) 
JV>0  Jv>0 


since  tr(Xy)  =  iT{YX)  as  a  general  property  of  the  trace  function.  Now 
perform  a  change  of  variables  Y  =  U^VU.  The  inverse  transform  is  V  = 
UYU^ .  The  Jacobian  of  the  transformation  is  1,  by  corollary  7.  Thus 


„  t  det(t/rt/^) '^''^"^tr 

=  /  etr(y4y  )- 
JY>(i 

=  I  etr(/iy)J 

Jy>o 


Ak) 

cr^ik) 

|det(r)r<''+'’»etr(-r-»] 


■idY) 


CT,[k) 


{dY)=g{A) 


□ 


Theorem  92  Let  W\  ~  CVFp(Ari,S)  and  W2  ~  Ciyp(/:2,S).  If  ki  >  p,  then 

®  romplexification  by  Tague  [264]  of  a 
corollary  Styan  [262]  provided  for  the  case  nf  real  variables. 


Proof.  First  note  that 


e{W2W{-^W2}  =  Sw,  {H-jJl  rMFj  I  W,}} 
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When  W\  is  fixed,  then  theorem  90  tells  us 


e  {WjWf ‘W2  I  Wi}  =  +  k2  tr(M^r'E)E 

from  theorem  87,  We  require  ki  >  p  to  ensure  the 

denominator  is  not  zero.  Also,  k\  >  p  ensures  exists.  Continuing,  taking 
the  expectation  with  respect  to  Wi,  we  obtain 

<  (».»-,-'»■.) .  @  (jL)  r's  ...» 1(5^)  i-e]  = 

Theorem  93  IfWi  and  W2  are  independent,  Wi  ~  CWp(n,,  S),  then 


Wi  +  W2^  Cli^p(ni  +  nj,  E) 


This  is  a  complexification  of  Arnold’s  theorem  17.15(f),  which  was  stated  with 
out  proof. 


Proof.  From  theorem  75,  the  characteristic  function  of  the  associated  ran¬ 
dom  variable  W,  is 

♦».,(r)  =  |<iet(;,-iLr)r' 

Since  Wi  and  W2  are  independent,  the  characteristic  function  of  the  distri¬ 
bution  of  the  sum  is  the  product  of  the  individual  characteristic  functions. 
Thus 


(r)  =<i.*..(r)»rt.,('/’)  =idei(;,-.sr)r'[dei(/,-.Er)]— 


=  [det(/p-iET)]-("*+"^^ 
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This  is  the  characteristic  function  of  the  associated  random  variable  corre¬ 
sponding  to  the  complex  Wishart  distribution  ClTp(ni  -|-  n2,  S).  □ 


Theorem  94  Let  A  ~  CWp{n,Ip)  and  B  ~  CNp^miO,Ip,Im)  be  independent 
complex  random  variables.  Let  Z  =  A  +  —  CC^  and  B  =  CU.  Then 

Z  ~  Ciyp(m  -f  n,  /p),  and  the  density  of  U  is  given  by 


giU)  = 


CTplm  +  n) 
TT'^pCTpln) 


det  (/  -  [/[/")  I" 


This  is  a  complex  version  of  the  derivation  given  by  Anderson  (p.  302)  [26] 
for  the  real  variables  case. 


Proof.  The  concept  of  solving  for  the  joint  distribution  of  Z  and  U  and  the 
recognizing  their  independence  is  due  to  Anderson.  From  theorem  93,  since 
BB^  ~  CVFp(m,  Ip),  we  know 

Z  ~  CVFp(n  +  m,  Ip) 


The  joint  distribution  of  A  and  B  is 


/(A,  B)  =  CWp{n,  I)  •  ClV,„,p(0,  /„,  Ip)  = 


|det  Al"-"  etr(- A)  etrj-BB^) 

Crp(n) 


Note  that  the  density  of  Z  is 


,,,  |detZr-^"-»-etr(-Z) 

’  Crp(m  -H  n) 
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We  want  to  find  the  joint  density  of  Z  and  U.  Begin  with  f{A^B)  and  ma¬ 
nipulate  it  until  we  have  a  function  that  includes  g{A  -|-  BB^)  as  a  factor. 

|det  {A  +  |-  {A  +  BB^)]  cr,(m  +  n) 

’  CTj,{m  +  n)  Crp(n) 

|det  A\^-^  1 


|det  {A  -I-  j55")r-P  |det  {A  -h  BB")r 


Note  that  the  term 


det  (a  -I-  BB”)  P  "  ^  etr  [-  (a  +  BB»)]  |det  zr'^”~'’  etr(-Z) 


Crp(m  +  n)  Crp(m  +  n) 

=  CWp(m  +  n,/) 

already  accounts  for  the  change  of  variables  from  (A  -|-  BB^)  to  Z.  Thus  we 
only  need  to  include  the  Jacobian  for  the  change  of  variables  from  B  to  U. 
J{B  -^U)  =  |det  .  Thus  g{Z,  U)  = 

|detZr+"-'’etr(-Z)Crp(m  +  n)  IdetAI”"'’  ldetC|'”* 


Crp(m  +  n)  7r-PCrp(n)  |det(A  +  BB»)\^-^  ldet(A  +  BB")r 
where  we  still  have  the  substitutions  to  complete.  For  this,  we  still  have  an 
identity  to  compute.  From 


Z  =  A  +  BB'^  =  CC^ 


we  have 


A  =  Z  -  BB^  =  CC  "  -  {CU){CUf  =  C{I  -  UU^)C^ 


Then 


|det  A| 

|det(A-|-i5B")| 


.-.I 
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and 


\detCf 


=  1 


\det{A  + 

We  substitute  our  identities  to  obtain  the  joint  density  of  Z  and  U. 

_  |det  etr(-Z)  +  n)  |det(/  -  UU^)^ 


-p 


9{Z.U) 


CTp{m  +  n) 


Crp(n)7r"‘P 


We  note  that  Z  and  U  are  independent  by  the  Neyman- Fisher  factorization 
theorem.  We  know  the  density  of  Z  is  CWp(m  4-  n, /),  and  thus  the  density 


of  U  is 


9{V)  = 


crp(m  +  n) 

7r”*PCrp(n) 


det(/-t/f/")|” 


□ 


Theorem  95  Let  W  ~  CWp(n,  E),S  >  0,n  >  p,  and  rank(y4,x7)  =  q-  Then 
{AW~^ A^)~^  ~  CW,(n  —  p  +  g,  This  is  a  comptexification  of 

Arnold’s  theorem  17.15(g),  which  was  stated  without  proof.  This  is  also  similar 
to  Muirhead’s  theorem  3.2.11  for  the  real  variables  case. 

Proof.  This  follows  Muirhead’s  proof,  except  that  this  version  accounts  for 
the  structure  of  complex  variables. 

By  theorem  119,  there  exists  a  positive  definite  complex  matrix  E’^^  such 
that  E  =  (E‘/2)(E‘/2)«.  Thus,  E"*  =  E-^/^E-'/^.  Let  B  = 
which  implies  that  W  =  By  theorem  54, 

B  ~  CW,{n,E-'/2EE-"/*)  =  =  CW,(n,  /,) 


Let  R  =  AS  which  implies  A  =  Then 

By  theorem  125,  R  can  be  written  as  R  =  L{Iq, 0)H  where  H  is  a,pxp  unitary 
matrix,  and  Lqxq  is  a  positive  definite  matrix.  Then 

{AW-^A^)-^  ={RB-'^R»)-^  =  [(i{/„0}  //)  5-^  (i:{/„0} 


(  \ 
h 

= 

LiIq,0)HB-^H” 

H 

L» 

- 

/  ^ 

- 

-1 

- 

/  ^ 

- 

/  \ 

h 

L-» 

1 

II 

1 

a; 

/,  o)c- 

l«  J 

\  / 

where 


C  =  HB-^H"  ~  CWp{n,HIpH^)  =  CWj,{n,Ip) 


by  theorem  54  since  H  is  unitary. 


Du 

Di2 

Cu 

C\2 

Let  D  =  C-^  = 

D21 

D22 

and  C  = 

C2V 

C22 

are  q  x  q.  Then 


where  Cn  and  Du 


(AIT-'A")-*  =  (RB-^R^)-^  = 


Recall  from  lemma  34  (the  partitioned  matrix  right  inverse)  that  =  Cn  — 
C'i2C'm*^2i-  Du  corresponds  to  V  of  theorem  84.  By  that  theorem,  Df/ 
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CWq{n  -  p  +  9,  Ip)-  Then  by  theorem  54, 


~  -p  +  9,L  "/,T  *) 


=  CH/,(n~p  +  9,(TL"r') 


Consider  (AS-M")"^  Expanding, 


=  /?")-*  =  {RR”)-^ 

■  \  H‘  — ^ 

=  i(/,  o)«(i(/.  o)«) 


- 

/  \ 

- 

-1 

- 

/  \ 

l(  /, 

L» 

= 

T  (  /,  0  ) 

I9 

\  ’  / 

\  / 

. 

Therefore, 

(AM^-M")->  ~  CVT,(n  -  P  +  9, (AE'M")-') 


□ 


Lemma  23  .  Let  W'^  ~  C/fTp(^’,  /p),  A-  >  p  +  1  and  A  €  Then 


£{W-^AW-^}  =  d,A  +  d2/tr(A) 


where 


_ 1_ _ 

"  (fc  -  p  +  l)(fc  -  p  -  1) 
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and 

^  ^ _ 1 _ 

This  is  Styan’s  corollary  4  (part  Hi)  [262]  and  theorem  3  which  has  been  com¬ 
plexified  by  Tague  [264] ■ 

Proof.  By  theorem  91,  W~^  satisfies  the  property  g{A)  =  g{UAU^)  for 
unitary  U  £  U(p).  By  lemma  58, 

S{W'^W‘"^}  =  drSi,6i„^  +  d26ir^6ji 

where  W”  is  the  element  (i,j)  of  W~^.  Recall  that 

{w-^Aw-^U  = 

i=i J=i 

and  thus 

€{{w-^Aw-^U]  =  ^ 

Ij=1/=1  j  j=l/=l 

p  p  p 

=  E  E  (-i.  ^ijhlm  "h  d2SiYfi6jl)  —  ^imdi  +  d2^tm  E  ^jj  •^1771  <^1+^1171^2  tr(^) 

j=i i=\  j=i 

Then 

S{W-^AW-^}  =  diA  +  d2/tr(A) 

where  W-^  ~  CIWp{kJ). 

By  lemma  40, 

z«{/-y(K«y)-iy«}z 


W"  =  e"(y"y)-ie.  + 
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where  W  is  partitioned  as 

^  yHy  Y^Z 

W  = 

^  Z^Y  Z^Z 

with  Z  being  a  column  vector,  and  e/  is  the  standard  basis  vector  consisting 
of  all  zeros,  except  a  1  in  position  i.  Also, 

WPP  =  - 1 - 

This  product  is 

VF‘‘IFPP  =  e^{Y^Y)-^eiW^^  +  \ef{Y^Y)-^Y^Z^ 

We  want  to  find  £{  WW^p}. 

Since  W~'^  ~  CIWp{k,I),  then  W  ~  Ciyp(^, /).  By  theorem  64,  elements 
on  the  diagonal  of  W~^  have  the  distributional  property 

lyjj  ~  X2(fe-p+i)(0) 

In  corollary  23,  if  we  let 

X  =  W22-  W2iW-^^Wi2  =  Z^Z  -  Z^Y{Y^Y)-^Y^Z 

we  then  see  that  X  is  independent  of  Wu  =  F^F,  and  hence  X  is  independent 
of  F.  X  is  (VFpp)"^  Therefore,  is  independent  of  F.  Then 

e{W''WP^}  =  e  |ef^(F"F)-'eiVF'’'’  +  |e"(F"F)-^F"Zp 

=  £  {e."(F"F)-*ei}  £  {W^f’}  +  £  ||e.^(F^F)-»F"z|" 
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Now  look  at  the  first  term.  From  theorem  87  we  know  £  By 

theorem  55, 

K"K  =  VF„~ClFp_,(A:,/p_,) 


Again,  applying  theorem  87,  we  know 


1 


^  -p+  1 


/p-1 


Substituting  these  results  we  find 


e^£{{Y”Y)-^}ei£{W^} 


1 


k-p+1 


for  k  >  p. 

Now,  consider  the  computation 


£  =  £y  |(VFpp)2  \e^{Y^Y) 


Since 


2 

Wpp 


~  X2(fc-p+I)(0) 


from  the  distribution  we  know 


^ _ 1 _ 

\\  2  y  j  [2(fc-p  +  l)-2][2(A:-p+l)-4] 

_ _ 1 _ _ _ 1 _ 

“  \2{k  -  p)]  [2(fc  -  p)  -  2]  (4(fc  -  p)]  [fc  -  p  -  1] 


Therefore 


f  {(H/pp)2} 


_ 1 _ 

[k  -p][k-p-  1] 


(F.3) 
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for  fc  >  p  +  1.  Continuing, 


f  Y  ^  =  efiY^Y)-^Y^S{ZZ^}Y{Y^Y)-^ei 

Since  W  ~  CWp{k,  /),  then  Z  ~  CAfjt(0, 1)  by  theorem  55  and  theorem  53. 
By  theorem  52,  E  =  /.  Thus 

e"(K"y)-‘K"K(y"K)'^e.  =  e^(Y"Y)-'^ei 


We  substitute  this  back  in  to  obtain 


Adding  our  results,  we  get 


£{W''W^P}  = 


+ 


k  —  p I )  \k  —  p )  [A;  —  p  +  1]  [/;  —  p]  [A:  —  p  —  1] 


[A;  -  p  +  1]  [A;  -  p] 


1  + 


k  —  p  —  I 


[A:  -  p  +  1]  [A:  -  p  -  1] 


(F.4) 


Now,  calculate  £{W  ^ AW  Consider 
f  {(VT")"}  =£[e^W-\,e^W-\,] 
when  k  >  p  \i  A  =  e^ej^,  then 


1 


{k-p)(k-p-l) 


^k -p)(Lp-i) = '■ = 


(F.5) 


£{W''W^^}  =  F{efW-^e.efW-^e,} 


Also, 
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Now  let  A  =  e,e^.  Then  we  find 


S  [d,e.ef  +  dj  tr(eief )/]  ej  = 

From  equation  F.4,  di  =  Substitute  into  equation  F.5  to  get 

,  1 _ 1 

^  [A -p][fc-p- 1]  [A:-p+l][fc-p-l] 

=  _ !_1  f— L—'l  = _ L 

k  —  p  fc  — pd-lj\fc  — p— 1/  [fc  — p+l][fc  — 

Therefore,  W~^  ~  CIWp(k,Ip)  and  A:  >  p  +  1, 


p]  [fc  -  p  -  1] 


e{w-^Aw-^}  = 


[A:  -  p  +  1]  [A:  -  p  -  1]  [A;  -  p  +  1]  [A:  -  p]  [A:  -  p  -  1] 


[tvA]! 


Theorem  96  Let  W~^  ~  C/M/p(A:, S)  and  A  E  Then  for  k  >  p+1  we 

have 


£W-'AW-n  = _ + _ N-4S-')]S-- 

(A:  -  p  +  1)(A;  -  p  -  1)  (A:  -  p  +  1)(A:  -  p)(A;  -  p  -  1) 

This  is  Tague’s  complexification  [264]  Styan’s  corollary  14  [262]. 


Proof.  We  start  with  the  result  for  V  *  ~  CIWp{k,  Ip)  where 
e{V-^BV-^}  =  diB  +  d2ltT{B) 

Let  S  =  GG^.  Then  W  ~  CWp(A:,S)  implies  F  =  G-^WG'^  ~  CWp(A:,/) 
by  theorem  54.  Thus 


S  {G^W-^GBG^W-^G}  =  d,B  +  d2ltT{B) 


650 


Multiply  this  by  G~^  and  postmultiply  by  G  ^  We  get 

G-”G"S[W-^GBG"W-'^]GG-^  =  diG-^ BG-^  +  diG-” IG-'^  iT{B) 
which  we  rewrite  as 

e  [w-^GBG^W-'^]  =  diG-^BG-^  +  tr(S) 

Let  A  =  GBG^,  which  implies  B  =  G~^AG~^.  Then 

£{W-MW}  =  diS-MS-'  +  djS-'  tr(G-MG-") 

=  diS-MS-^  +  d^T.-^  tr{S-M) 
since  tr(AJ3C)  =  tr(C>lB).  Recall  that 

[fc  -  p  +  1]  [^  -  P  -  1] 

and 

.  _ _ \ _ 

^  [fc  -  p  +  1]  [A:  -  p]  [fc  -  p  -  1] 

□ 

Corollary  27  Let  W  ~  CWp(A:,  E)  and  k  >  p  +  1.  then 

^{(W)"}  =  5{W-'W} 

(S-if  ,  E-Ur(S-‘) _ 

"  [A;  -  p  +  1]  [A:  -  p  -  1]  [A:  -  p  +  1]  [A:  -  p]  [A:  -  p  -  1] 

This  is  Styan’s  corollary  16  [262]  which  was  complexified  by  Tague  [264]- 


651 


Proof.  Let  A  =  I  m  theorem  96. 


Corollary  28  Let  W  ~  CWp{k,  E),  a  £  and  k  >  p+l.  Then 


var(iy  ^a)  =  - 


E-iaa"E-» 


[k  -  p  +  l][k  -  pf  [k  -  p  -  1]  [fc  -  p  +  1]  [fc  -  p]  [fc  -  p  -  1] 
This  was  done  originally  by  Tague  [264],  motivated  by  the  work  of  Styan  [262J. 
Tague  also  produced  results  for  the  real  variables  case. 


Proof.  Define 


var(W"-^)  =  e  [W-^aa^W-'^]  -  E  E  {a”W-^} 


In  theorem  96,  let  A  =  aa^.  Then 


E-^aa^S'i 


a"E-iaS-i 


^  J  [A;  -  p  +  1]  [fc  -  p  -  1]  [fc  -  p  +  1]  -  p]  [fc  -  p  -  1] 


where 


tr(aa"E  ^)  =  tr(a^E  ^a)  =  a^S  ^a 


which  is  a  scalar.  The  numerator  of  the  last  term  could  also  be  S  ^aS  ^a^. 


|f{lVa}|"  =  £{VP-'a}5:{a"H^-»}=  [^{VV^"'}a]  [a"5{vr-'}] 


By  theorem  87  then 


[£{W-'}a]  [aXeiW-'}]  =  ^E-aa«E- 


[A;  -  p  +  1]  [A:  -  p  -  1]  [A:  -  p]^ 


Note  that 


652 


_  —  2kp  +  —  [P  —  kp  ~  k  —  kp  +  +  p  +  k  —  p  —  1] 

{k-p  +  !)(&  - p)^{k  -p-1) 

P  —  2kp  +  p^  —  P  +  2kp  —  p^  +  1  1 


{k  —  p  +  1)(A:  —  p)^{k  —  p  —  1)  (k  —  p  +  1)(A:  —  p)^{k  —  p  —  1) 
The  result  follows  from  these  pieces  substituted  back  into  var(iy~'a).  □ 

Corollary  29  Let  W  ~  ClTp(fc,  S)  and  k  >  p  +  1.  Then 
£  ftr  \  = _ _ + _ MS-)]" 


p]  [^  -  p  -  1] 

This  is  a  variation  by  Tague  [264]  on  Styan’s  corollary  16  [262]. 

Proof.  £  |tr  =  tr  [5  .  The  result  follows  immedi¬ 

ately  from  corollary  27.  □ 

Corollary  30  Let  W  ~  CWp{k,I)  and  k  >  p  +  1.  Then 

p\p{k-p}  +  1] 


4-  1]  [fc  —  p]  [fc  —  p  —  1] 


This  is  a  variation  by  Tague  [264]  on  Styan’s  corollary  16  [262]. 

Proof. 

=  £{\te«W-'e]  =tt.^”£{w-'e.e«W-'}ei 

L»=i  J  j=i  J  »=i  j=i 


Note  that 


el^S{W-^eiefW-'^}ej  = 


€{{W''f}  i  =  j 
£{W''W^^}  i^j 
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occurs  p  times,  and  £  {W"W^^}  occurs  p(p  —  1)  times.  Using 
equations  F.3  and  F.4,  we  get 

£  { ftr  =  P  +  P(P-^) 

U  ^  ''■I  /  [k—p][k  —  p  —  l]  [A:  —  p  +  1]  [A:  —  p  —  1] 

^  p(A:  -  p  +  1)  +  p(p  -  1)(A:  -  p) 

(A:-p  +  l)(A;-p)(A:-p  -  1) 

The  numerator  is  simplified  as  follows. 

pA;  —  p^  +  p  +  p{pk  —  p^  —  k  +  p)  =  pk  —  p^  +  p  +  p^k  —  p^  —  pk  +  p^ 

=  p  +  p^A:  -  p^  =  p  +  p^{k  -  p)  =  p[l  +  p{k  -  p)] 

The  result  follows  from  this.  □ 


Corollary  31  Let  W  ~  ClFp(A:, /)  and  A:  >  p  +  1.  Then 

var  [tr  (W-')\  = - - 

^  [A:  —  p  +  1]  [A;  —  p]  [A;  —  p  —  1] 

This  is  Styan’s  corollary  11  [262]  which  was  complexified  by  Tague  [264]- 


Proof. 

var  [tr  =  5|[tr  }  ~  [^ (^”01] 

By  theorem  88,  £  {tr(lF“^)}  =  -j^.  Using  corollary  30,  we  get 

varftrrVF-01  = _ p[p(fe-p)  +  l] _ ^ 

1  v"  )\  (jt_p+i)(^_p)(/t_p_l)  (A:-p)2 

Looking  at  the  numerator  of  the  difference,  we  get 


p[l  +  p{k  -p)](A:  -  p)  -  p^{k  -  p  +  1)(A:  -  p  -  1) 
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=  p{l->rpk  -  p‘^)(k  -p)  ~  p^{k  -  p  +  l)(k  -  p  -  1) 

=  p{k  -f  pk^  —  p^k  —  p  —  p^k  +  p^)  —  p^{k^  —  pk~k  —  pk  +  p^+p  +  k  —  p  —  \  ) 

=  pk  +  p^k^  —  p^k  —  p^  —  p^k  +  p*  —  p^k^  +  2p^k  —  p'^  +  p^  =  pk 

Placing  this  over  the  common  denominator  {k  —  p+  l){k  —  py{k  —  p—  1)  yields 
the  result.  □ 

F.5  Tague  Example:  Signal-to-Noise  Ratio 

Let  x{t)  G  CP  be  a  random  output  of  a  sensor  array  at  time  t,  which  is 
the  sum  of  signal  s{t)  passed  through  a  narrowband  beamformer  and  random 
noise  n{t).  Explicitly,  x(<)  =  ds{t)  +  n{t).  The  complex  vector  d  €  Cp  of  unit 
length  is  the  narrowband  steering  vector.  The  random  noise  n{t)  is  cissumed 
to  have  distribution  CA^p(0,  iZ/v)-  The  random  signal  s{t)  is  assumed  to  have 
distribution  CNi(0,cr^). 

Consider  a  beamformer  whose  output  y{t)  is  given  by  y{t)  =  w^x{t)  where 

A 

w  =  R]^  d,  and 

1  ^ 
m=l 

We  assume  that  each  noise  measurement  is  mutually  independent  of  the  signal 
and  other  noise  measurements.  The  solution  for  lu  is  the  optimum  Wiener 
solution  and  w  =  Rj^^d  is  the  Wiener-Hopf  equation.  This  is  a  common 
example,  and  it  is  discussed  by  Monzingo  and  Miller,  Chapter  3  [185]. 
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The  problem  we  want  to  solve  is  to  find  the  signal-to-noise  ratio  of  the 
beamformer  output.  We  begin  by  noting 

y{t)  =  w^x{t)  =  {kj^d^  x{t)  —  [ds(t)  +  n(t)]  (F.6) 

The  expected  value  of  the  power  at  the  beamformer  output  is 

S{\\yml}  =  S{y»{t)y{t)}  (F.7) 

=  £  I  (RN^d^  [ds{t)  +  n(t)]]  [ds{t)  +  n(t)]l  | 

=  5  I  [[d"5*(t)  +  n"(t)]  (RjfU)]  [ds(<)  +  n(t)]]  | 

=  €  {d^s*{t)R];fUd^R];,”ds{t)  +  n^{t)R];,^dd^R];/^ds{t)  (F.8) 

+d”s^(t)R];j^dd^R];j^n{t)  +  n^{t)Rji^dd”kj;kn{t)} 

Now  we  invoke  the  assumption  that  s{t)  and  n{t)  are  statistically  independent. 
Then  S  {|lj/(t)||2} 

=  S  {s*{t)s{t)}  d^e  {Rjf^dd^kj;/^}d  +  e  {n^{t)Rjldd^Rj/^d}  S  {s(t)} 

(F.9) 

+e  {s*(0}  d^e  [Rjldd^kk^n{t)]  +  S  {n»{t)kj:f^dd^Rj»n{i)] 

Observing  that  S  {^(O}  =  0  likewise  S  {5*(0}  =  0>  we  simplify  this  to 

^  {l|y(Oll0  =  ^  {l^(0l'}  {Rj,'dd"Rj/^]  d^e  [n"{t)ki^^dd^kj;j»n{t)] 

(F.IO) 

Recall  that  s{t)  has  zero  mean,  and  thus  S  {s*(t)s(<)}  =  cr^.  Also  note  that 
n^{t)Rjf^d  and  d^ Rj/^n{t)  are  scalars  and  therefore  commute.  The  quantity 
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Rj^^d  IS  a.  column  vector,  so  R^/dd^  Rj/^  = 
the  observations,  we  get 


is  a  matrix  norm 


^  {l|y(^)ll2}  =  (yld^s!j^RNdj^^^d  +  £{d^R;^”n{t)n^{t)Ri^^d} 

=  (T^d^S  jll^N  <^[2}  d  +  d^£  {||^n"«(0||2}  d 

We  note  that  kRj^j  ~  CWp{k,  Rn)-  We  apply  theorem  96  using  A 
Thus 

e  =  el^^k]i'dd>'R-n''^ 

since  Rn  =  Rni  so  we  get 


RN^dd^Rj,^ 


Mdd»RN^)]RN^ 


+  l][A:-p-lI  [^-p+ l][^-p][^-j 


This  implies 


£  {Ri,'dd>' R-„'}  - 


RZ^dd^R-^ 


N 


k^ 


'  [k-p+l][k-p\[k~p-l]^^"^'^'^  Rn)]Rn 
To  perform  the  next  step,  note  that  tr{dd^/l^^)  -  d^ Rj^d  is  a 
This  allows  us  to  say 


d"  iv{dd^R-^^)R-fld  =  ix{dd"  Rj^^)d^  R-^U  =  {d^R-N^df 


Then 


d^€{RNUd^Rjl]d 


.  Using 


(F.ll) 


=  dd^. 


’-1] 

(F.12) 


(F.13) 


scalar. 


(F.14) 


(F.15) 
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k^ 


-M^R-N^dy 


[fc  -  p+ l][fc -p  -  I]'  '  '  [fc -p+ 1][A: -p][fc -p- 1] 

Since  R^l  is  Hermitian  we  know  Rj^d  is  real  which  implies  {d^ Rj^dy  = 


d»Rj;U 


.  Note  that 


_ 1 _  _ 1 _ 

[k-p  +  !][/:  -  p-1]  [^-p  +  l][k  -  p][^  -p  -  1] 

k  —  p  +  I  1 

[k  -  p  +  l][k  -  p][k  -  p- 1]  [A;  -  p][A;  -  p  -  1] 

Then 


(T^d^E  {l^N  d  =  (T^id^R^Uy-^ 


(F.16) 


[A:  -  p][A;  -  p  -  1] 

We  evaluate  the  second  remaining  term  of  £  {||p(<)ll2}  stages.  This  is 
the  noise  component. 


d^E  \^Rjy^ n{t)n^ {t)Riy^  d  =  d^E^Rj^^  |n(<)n^(<)}]  RN}d 

where  we  note  Rj/  is  Hermitian  and  n{t)  is  independent  of  the  noise  samples 
used  to  construct  Rj^.  Thus  we  get 

d^E  {RN^n{t)n^{t)R];/  }d=--d^E  {R]^^RnRn^  }  d 
With  A:  >  p  +  1,  we  now  apply  theorem  96. 

fjflKr'flwftK'}  =  1]  (fl’) 

+  [k-p+l][k-p]lk-p-\] 
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_ _ _ p-i  .  _ ^ _ p-i 

-  [k-p+i][k-p-l]^^  [k-p+i][k-p][k-p-lf^ 

where  tr  =  tr(/)  =  p 

_  P{k-p  +  p)  1  ^  _ P _  1 

[fc-p  +  l][fc-p][A;-p- 1]  [A:-p+ l][fc-p][A;-p- 1]  ^ 

Therefore 


d^S{R^^RNR^^}d- 


(F.18) 


We  now  compute  the  signal-to-noise  ratio. 

g{|t»»<<.(0f}  ^  ^  ^UX£[lR-^'d\Qd 

£{|!i,''n(i)|'}  d'<S{R-f,'RNrt;,'}d 


j^-^^^,id>>Ri'd)^ 

[fc-p+l][fc-p](*-p-lj  ( 

=  ^^^aUdXR-^'d) 


(F.19) 


Appendix  G 
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ZONAL  POLYNOMIAL  COMMENTARY 
G.l  History  of  Development 

Zonal  polynomials  are  special  functions,  just  as  Bessel,  Legendre,  Tschebyshev, 
Hankel,  and  trigonometric  functions  are  special  functions.  Zonal  polynomials 
are  important  to  this  work  because  they  are  used  to  evaluate  a  factor  term  in 
the  probability  density  function  of  the  sample  eigenvalues  of  a  Wishart  matrix. 

At  first  blush,  zonal  polynomials  appear  to  the  casual  reader  of  this  thesis 
to  have  very  little  to  do  with  the  content  of  this  thesis.  However,  the  fun¬ 
damental  contribution  to  advancing  the  order  determination  problem  hinges 
on  the  existence  and  properties  of  zonal  polynomials.  The  properties  of  these 
functions  are  still  objects  of  current  research.  Muirhead  [188]  reports  that 
zonal  polynomials  have  (as  of  the  time  articles  were  written  for  the  Encyclo¬ 
pedia  of  Statistical  Sciences  published  in  1988)  been  defined  only  for  symmetric 
matrices.  He  gives  a  suggestion  of  how  to  extend  the  definition  to  Hermitian 
matrices.  Future  progress  in  the  small  sample  order  determination  problem 
must  build  upon  these  concepts. 

The  application  of  zonal  polynomials  to  the  related  problem  of  finding  the 
probability  density  function  of  sample  eigenvalues  was  first  made  by  A.  T. 
James  equation  (94)  [120]  in  1964.  James  derived  his  result  for  the  case  of  a 
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real  Wishart  matrix,  and  stated  the  result  for  the  symmetric  complex  Wishart 
matrix  by  observing  similarity  of  forms  of  other  results.  In  1987,  Gross  and 
Richards  [96]  derived  zonal  polynomials  for  the  real,  complex  Hermitian,  and 
quaternion  cases  simultaneously  by  studying  invariants  in  a  group  representa¬ 
tion  setting. 

Classical  work  in  acoustic  signal  processing  has  been  done  using  forms 
resulting  in  Hermitian  Wishart  matrices,  which  I  have  merely  called  Complex 
Wishart  matrices.  Indeed,  much  of  my  development  could  be  recast  in  terms  of 
Complex  Symmetric  Wishart  matrices  with  accompanying  background  theory, 
but  at  the  expense  of  losing  use  of  properties  of  an  inner  product  space  and 
thus  also  access  to  the  use  of  the  concept  of  an  adjoint.  The  very  important 
contribution  by  Gross  and  Richards  [96]  justified  the  application  of  the  form 
of  the  results  for  the  joint  density  of  sample  eigenvalues  previously  written 
down  by  inspection  by  A.  T.  James  [120].  The  meaning  of  the  detail  of  James’ 
results  is  different. 

Gross  and  Richards  [96]  point  out  that  zonal  polynomials  are  spherical 
functions  for  the  Gelfand  pair  {G,K).  In  a  general  setting,  spherical  functions 
are  studied  by  Helgason,  Chapter  IV  [105].  Readers  of  Helgason  or  Gross  and 
Richards  will  profit  by  first  preparing  a  background  in  Lie  theory. 

Muirhead  [187]  develops  zonal  polynomials  for  the  real  variables  case  in  a 
manner  easily  followed  by  engineers.  Because  the  Laplacian  operator  is  un- 
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conditionally  applied  in  his  development,  the  approach  does  not  work  for  the 
complex  variable  case  without  further  restrictions.  Once  the  general  devel¬ 
opment  by  Gross  and  Richards  [96]  is  accepted  which  bypasses  the  problem 
of  unconditionally  applying  the  Laplacian,  then  Muirhead’s  results  apply  ei¬ 
ther  directly  or  with  consideration  for  differences  in  the  real  dimension  and 
structure  of  the  real  and  complex  variables  cases.  Takemura  [265]  provided  a 
7  page  development  of  complex  zonal  polynomials  that  relied  heavily  on  an 
earlier  development  of  real  zonal  polynomials  in  that  monograph. 


G.2  Gross  and  Richards’  Development 

This  section  is  a  review  of  the  development  of  zonal  polynomials  done  by  Gross 
and  Richards  [96].  Although  much  of  this  discussion  is  directly  from  their 
paper,  I  have  generally  omitted  proofs  and  ventured  comments  that  would 
allow  an  engineer  to  more  easily  follow  their  paper,  provided  they  have  read 
the  various  surveys  of  algebra  and  analysis  contained  in  this  thesis.  Part  of 
the  contribution  here  is  in  helping  identify  which  spaces  are  objects  of  study 
at  any  particular  point.  I  also  attempt  to  highlight  what  is  important,  and  to 
provide  concrete  examples  at  various  points. 

Gross  and  Richards’  work  is  very  important.  In  one  treatment,  they  de¬ 
velop  zonal  polynomials  of  matrix  argument  for  real,  complex  Hermitian,  and 
quaternion  fields.  Terminology  used  by  them  comes  out  of  the  study  of  Lie 
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groups  and  group  representation  theory.  The  mathematical  dictionary  of  most 
frequent  value  in  reading  this  paper  is  the  one  edited  by  Ito  [115],  published  by 
MIT  Press  in  four  volumes,  beginning  in  1987.  A  new  mathematical  dictionary 
that  has  volumes  through  “Sp”  published  (1992)  that  is  excellent  is  a  transla¬ 
tion  of  Vinogradov’s  Soviet  Mathematical  Encyclopaedia,  edited  by  Hazewinkel 
[104].  Smaller  dictionaries  have  proven  to  not  be  useful  for  reading  Gross  and 
Richards’  work.  We  are  still  in  need  of  a  dictionary  to  translate  the  technical 
language  of  algebraists  into  the  language  that  engineers  understand,  and  vice 
versa. 

We  are  interested  in  group  representation  theory  because  it  allows  us  to 
connect  a  practical  result  we  need  with  an  intuitive  abstraction  about  the  na¬ 
ture  of  the  problem  we  are  dealing  with.  This  approach  allows  us  to  understand 
properties  of  our  problem  that  otherwise  may  escape  notice. 

We  need  to  evaluate  etr(— E“M)  in  the  development  of  the  joint  density 
of  the  eigenvalues  of  A.  We  have  gotten  to  another  form  whose  evaluation  will 
get  us  closer  to  the  answer  we  need.  It  is 


/  etT{-'E-^U"AU){dU) 


(G.l) 


Our  journey  will  lead  us  to  express  the  exponential  in  terms  of  zonal  polyno¬ 
mials.  When  this  is  done,  we  can  take  advantage  of  a  splitting,  or  decomposi¬ 
tion,  property  that  separates  into  the  product  of  a  function  of  (— S~*)  times 


the  same  function  of  A. 
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In  examining  our  trace  function,  we  observe  that  its  value  depends  only  on 
the  sum  of  the  eigenvalues  of  the  argument  of  the  trace  function.  If  we  looked 
at  the  set  of  all  polynomial  functions  of  the  argument  and  attempt  to  select 
an  expression  for  the  trace  function,  we  find  that  we  are  dealing  with  concepts 
of  invariant  spaces.  This  is  our  clue  to  consider  the  wonderful  world  of  group 
representation  theory.  This  leads  to  the  development  of  zonal  polynomials 
which  form  the  basis  for  the  space  of  polynomials  we  are  interested  in. 

Gross  and  Richards  [96]  begin  their  development  of  zonal  polynomials  by 
considering  the  structure  of  the  algebra  of  polynomials  defined  on  the  General 
Linear  Group  G  =  GL(n,  F)  consisting  of  all  nonsingular  n  x  n  matrices  whose 
elements  are  taken  from  the  field  F.  This  field  F  may  be  real  (R),  Complex 
(C),  or  Quaternion  (H).  This  set  of  polynomials  is  identified  by  the  symbol 
P(G).  Pay  close  attention  to  the  various  modifications  to  this  notation  to 
indicate  different  sets.  Group  representation  theory  first  comes  into  play  by 
the  definition  of  the  function  R,  that  takes  its  argument  from  group  G,  and 
acts  as  a  transformation  on  the  linear  space  P(G).  R  is  defined  by  the  action 

R{a)(p{x)  =  (p{xa)  (G.2) 

where  ip  G  P(G)  and  a,x  ^  G.  R  is  called  the  right  regular  representation  of 
group  G  on  the  linear  space  P(G).  Note  that 

R(i)[R(a)<^(a:)]  =  R{b)ip{xa)  =  ip(xab)  =  R(ab)ip(x)  (G.3) 

Thus  R(ab)  =  R{b)R{a).  Since  a  function  obeying  f{xy)  =  f{x)f{y)  is 
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called  a  “homomorphism”,  we  will  call  a  function  with  the  opposite  order 
f{xy)  =  f{y)f{x)  a  “heteromorphism”  (a  term  first  used  by  Tom  Concannon, 
1989).  We  note  that  Vilenkin  [271]  defines  a  group  representation  as  being  a 
homomorphism,  but  careful  following  of  his  nonabelian  examples  show  them 
to  be  heteromorphisms.  The  structure  of  group  representation  theory  can  be 
recast  as  heteromorphisms  without  damage  to  its  beauty. 

Gross  and  Richards  partition  P(G)  into  sets  of  polynomials  Pd(G)  defined 
on  G  that  are  homogeneous  of  degree  d.  Thus 

OO 

P(G)  =  ©P4G)  (G.4) 

<i=0 

They  define  a  scalar  product  on  P(G)  defined  in  terms  of  a  differential  opera¬ 
tor.  Using  this,  they  show  that  the  subspaces  {Pd(G)}  are  mutually  orthogonal 
and  together  span  P(G).  They  avoid  nagging  issues  of  differentiability  in  the 
field  of  definition  of  the  argument  by  instead  considering  the  fields  in  their  iso¬ 
morphic  real  fields.  They  do  not  therefore  require  their  polynomial  functions 
to  be  holomorphic.  They  selected  the  inner  product 

<  >=  Z)(^)v?(x)  |x=o  (G.5) 

because  it  has  the  property  of  forming  a  weighted  sum  of  products  of  coeffi¬ 
cients  having  the  same  product  of  indeterminates,  raised  to  identical  powers. 

Zonal  polynomials  are  defined  as  homogeneous  harmonic  polynomials  on 
the  surface  of  a  sphere.  Assumption  of  differentiability  is  routine  and  is  not 
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an  issue  of  central  concern  for  real  variables.  It  is  a  main  concern  for  us.  We 
note  that  polynomials  in  z  are  differentiable  with  respect  to  z.  Problems  arise 
with  polynomials  that  include  terms  such  as  z*. 

Suppose,  instead,  that  we  treat  polynomials  as  A^-tuples  in  the  way  defined 
by  Broida  and  Williamson  (p.  253)  [47].  Then  a  polynomial  is  represented  as 
a  vector  of  infinite  length  with  N  entries  being  non-zero.  One  may  then  define 
an  inner  product  on  the  vector  of  the  coefficients  of  a  polynomial. 

Define  a  compound  index  a  the  same  way  Gross  and  Richards  did. 

O'  def  (ai,a2,  •  •  •  a^v)  (G.b) 

We  let  a  term  of  a  polynomial  of  N  variables  CaX"  be  given  by 

(G.7) 

We  can  establish  a  collating  sequence  for  (ai,  Q2,  •  •  •  to  linearize  our  multi¬ 
dimensional  array  of  coefficients  {oo}.  For  some  fixed  d,  we  can  use  a  counting 
sequence  to  establish  an  ordering  for  all  o  such  that 

|a|  def  ai  -f  02  H - \-  =  d  (G.8) 

where  Ofc  >  0  is  an  integer.  For  example,  let  i>  =  d  -|- 1  be  the  base  of  a  number 

N 

system.  Then  an  ordering  of  lol  =  d  can  be  given  by  the  number  ^  Okb^  *. 
For  fixed  N  and  fixed  d,  there  are  elements. 

Define 


o!  def  (oi!)(q2!)  •  •  •  (of^!) 


(G.9) 
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We  may  define  the  scalar  product  of  two  polynomials  ^  and  (p  = 

a 

by 

Ct 

<  (G.IO) 

or 

I  have  switched  the  standard  mathematician’s  order  of  arguments  in  the  inner 
product  to  make  use  of  the  notation  common  to  engineers  with  vectors  where 
<  X,  y  >=  x^y.  Note  that  as  long  as  AT  <  oo,  I  could  just  as  easily  define  the 
inner  product  by 

<  >='^blaa  (G.ll) 

or 

since  the  finiteness  guarantees  convergence.  I  chose  to  retain  the  a!  to  maintain 
the  same  notation  used  by  Gross  and  Richards.  Note  that  I  explicitly  have 
not  used  an  operator  that  is  necessarily  a  differential  operator.  This  scalar 
product  obeys  the  properties  of  an  inner  product.  Let 

||V?|1  =<  (^,V7  >2  (G.12) 

be  the  inner  product  space  norm  of  if.  Define  the  distance  between  0  and  (f 

by 

(G.13) 

We  now  have  a  metric  space. 

The  purpose  of  the  inner  product  defined  by  Gross  and  Richards  was  to 
define  orthogonality.  The  inner  product  defined  above  provides  the  same  or¬ 
thogonality  results.  They  use  the  inner  product  to  demonstrate  the  property 
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that  {Pd(G)}  is  a  set  of  mutually  orthogonal  spaces  of  homogeneous  polyno¬ 
mials  that  together  span  P((j).  The  Broida  and  Williamson  construct  permits 
the  desired  observation  without  grappling  with  differentiability.  Numerically, 
there  is  no  difference  between  the  inner  product  definitions.  Weighting  schemes 
not  using  the  a!  term  in  the  sum  can  also  produce  the  required  properties.  The 
requirement  is  to  obtain  a  convergent  series,  particularly  when  N  becomes  un¬ 
bounded,  and  to  maintain  satisfaction  of  the  properties  of  an  inner  product. 
This  means  that  differentiability,  and  thus  harmonicity,  are  not  required  prop¬ 
erties  of  the  polynomials  developed  by  Gross  and  Richards.  Recall  that  zonal 
polynomials  are  characterized  by  the  statement  that  they  are  homogeneous 
harmonic  polynomials  defined  on  the  surface  of  an  n-dimensional  sphere.  Ho¬ 
mogeneous  polynomials  defined  on  the  surface  of  an  n-dimensional  sphere  that 
are  harmonic  are  special  cases  of  the  set  of  polynomials  developed  by  Gross 
and  Richards.  The  property  of  harmonicity  is  an  additional  benefit  when  you 
select  a!  as  the  weighting  term  because  this  permits  interpretation  of  the  inner 
product  as  a  differential  operator  as  done  by  Gross  and  Richards.  Their  prop¬ 
erties  for  representations  R  and  r  continue  to  hold  when  weights  are  selected 
to  produce  a  finite- valued  inner  product. 

Let  R  be  a  maximal  compact  subgroup  of  G.  When  F  =  C,  then  K  = 
U(n)  is  the  set  of  unitary  n  x  n  matrices 


K  =  {k:kk”  =  /„,  keG) 


(G.14) 
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Gross  and  Richards  use  the  algebraist’s  convention  of  using  1  rather  than  /  to 
denote  the  identity  element  under  multiplication.  They  show  that  the  right 
regular  representation  R  of  group  G  acting  on  the  linear  space  of  polynomials 
?((?)  is  unitary  when  the  argument  of  R  is  restricted  to  elements  of  k  E  K  C 
G.  Thus 

<  R{k)tl),R(k)(f  >=<  >  (G.15) 

Gross  and  Richards  construct  their  practical  result,  which  is  a  set  of  poly¬ 
nomials  which  they  know  how  to  compute.  Let 

vcu  ~  X  (G.16) 

be  the  LDU  decomposition  of  x  when  x  is  represented  by  a  square  nxn  matrix. 
See  Stewart  (p.l32)  [259]  for  a  discussion  of  the  LDU  decomposition.  The  set 
of  lower  triangular  matrices  with  ones  on  the  main  diagonal  is  V .  We  note  that 
u  €  K  The  set  of  upper  triangular  matrices  with  ones  on  the  main  diagonal  is 
U.  We  note  that  u  ^  U.  Let  (17,0,0)  be  a  ring  with  identity  elements  e@  and 
60.  Let  a;  €  17.  Then  element  uj  of  the  ring  is  called  nilpotent  of  order  k  if  k 
is  the  smallest  positive  integer  such  that 

=a;  0  u;  0  . . .  0  a;=  e® 

k  times 

is  the  additive  identity  element  of  the  ring.  If  you  removed  the  ones  from  the 
main  diagonals  of  v  and  u,  then  the  new  elements  constructed  from  v  and  u 
would  be  nilpotent.  The  product  of  any  n  such  nilpotent  elements  constructed 


669 


from  elements  of  V,  which  we  call  {V  —  /),  or  the  product  of  any  n  such 
nilpotent  elements  constructed  from  elements  of  U,  which  we  call  {U  —  /),  is 
the  zero  matrix.  A  linear  transformation  is  called  unipotent  if  it  has  the  form 
I  +  A  where  A  is  nilpotent  [116].  Thus,  V  and  U  are  unipotent.  The  set  of 
n  X  n  nonsingular  diagonal  matrices  is  C.  Let  c  ^  C.  The  triple  (f/,  C,  V)  is 
called  the  standard  bitriangular  structure  for  G. 

Let 

m  =  (mi,m2,  (G.17) 

such  that 

mi  >  m2  >  •  •  •  >  >  0  (G.18) 

The  set  of  polynomials  P^"*(G)  is  defined  as  a  set  of  all  polynomials  having 
the  property 

(f{vcx)  =  P2m{c)^{x)  (G.19) 

where 

;<2m(c)  =  |c,|^”'|c2p”’--  kP”-  (G.20) 

This  function,  /i2m(c),  is  called  the  characteroi  C.  The  set  P^"‘(G)  is  invariant 
under  right  translation  by  G.  Let  7r2m  be  R  when  R  is  restricted  to  being 
applied  only  to  G  P^”*.  Then 


Tr2m{a)^{x)  =  ^{xa) 


(G.21) 


670 


describes  the  action  of  the  right  regular  representation  of  G  on  the  linear  space 
P2”*(G). 

Consider  the  structural  relationship  between  the  leading  principal  sub¬ 
matrices  of  an  LDU  decomposition  and  the  corresponding  leading  principal 
submatrix  of  the  original  matrix  x.  Looking  at  an  example  will  provide  a 
foundation  for  understanding  some  definitions  and  properties  to  follow. 


0  1  :  U23  ■  ^24 

X  .  : 

0  0  1  ;  U34 


0 


0 


0 


1 
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Cl 

:  C1U12  : 

C1U13 

U21C1 

U21C1U12  +  C2  : 

U21C1U13  +  C2tX23 

U31C1 

W3iCi«12  +  VZ2C2 

t^3lClUl3  +  V32C2U23  +  C3 

y  U41C1 


V41C1U12  +  V42C2 


V4\C\U\^  +  U42C2W23  +  *^43^3 

(G.23) 


Ci«i4 


\ 


V2lC\Ui4  +  C2U24 


V31CiUi4  +  U32C2U24  +  C3U34 


t;4lCi«i4  +  ^42^2^24  +  *^43^3^34  +  C4  y 

What  you  need  to  notice  here  is  the  partitioning  of  the  matrix  into  succeed- 
ingly  smaller  matrices  anchored  in  the  upper  left  corner.  These  matrices  are 
called  leading  principal  submatrices.  Let  Ak  denote  the  leading  principal 
submatrix  of  square  matrix  A.  Then  for  the  LDU  decomposition  of  x  given 
by  vcu  —  X,  we  observe  that  it  is  also  true  that  VkCkUk  =  Xk-  Define 


Akix)  =  (detx*)" 


(G.24) 
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where  7/  =  1  if  F  =  R  or  F  =  C,  and  =  |  for  F  =  H.  With  this  notation, 
let 

V?2m(xJ  =  |A(x)p’""  n  >  0  (G.25) 

i=i 

Note  that  (^2m  €  P^”*(G). 

V72m  and  7r2TO  turn  out  to  have  special  properties.  7r2m  is  an  irreducible 
representation  of  G  on  X2m  is  the  subrepresentation  of  R  on  the 

subspace  P^’”(G).  The  function  (^2m  has  a  group  theoretic  definition  as  the 
highest  weight  vector  of  7r2m-  is  the  element  of  P^’”(G),  unique  up  to 

scalar  multiples,  for  which 

7r2m(cu)  ^2m{x)  =  /i2m(c)  <f2m{x)  (G.26) 

for  all  (c,  u)  £  C  X  U.  From  this  we  know  that  P^”*(G)  is  the  span  of  right 
translates  of  (p2m  under  group  G.  This  form  looks  like  the  familiar  equation 
Ax  =  Ax  that  defines  the  eigenvalues  and  eigenvectors  of  A. 

We  observe  that  P^"*(G)  C  P<i(G)  where  d  =  2  |m| . 

Gross  and  Richards  also  engage  in  intuitive  abstraction.  Recall  that  in  our 
motivating  problem  we  are  dealing  with  the  trace  function.  The  trace  function 
has  the  property  that  it  yields  the  sum  of  eigenvalues  of  the  matrix  argument. 
Further,  we  are  concerned  with  the  trace  of  the  product  of  Hermitian  matrices. 
Let  A  and  B  be  Hermitian  positive  definite.  Matrix  A  may  then  be  factored 
as  A  =  CC^ .  By  a  property  of  the  trace  function. 


tr(BA)  =  tr(FGG")  =  tr(G"BG) 
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Matrix  D  =  C^BC  is  also  Hermitian.  Thus,  we  are  interested  in  properties 
of  tr(s)  where  s  E  S  and  S  is  the  set  of  Hermitian  matrices.  We  know  that 
somehow  we  need  to  work  with  invariant  subspaces. 

Let 

I{G)  =  {pikx)  =  p(x))  =  P{Gf  {G.27) 

be  the  set  of  all  polynomials  on  G  that  are  left-invariant  under  translation  by 
an  element  of  K.  Let  I{G)d  C  I{G)  be  the  set  of  nonzero  left  A"-invariant 
polynomials  that  are  homogeneous  of  degree  d.  Note  that  —I  G  K  since 
(—/)(— 7)^  =  7.  Thus  p{—x)  =  p{x),  which  implies  that  only  polynomials 
homogeneous  of  even  degree  are  nonzero  in  I{G).  Therefore 


1(G)  mu  (G.28) 

d 

Let  G  P(G)-  Define  the  spherical  transformation  E  :  (y?  — by 


(p*{x)  =  /  ip{kx)dk 
Jk 


(G.29) 


for  all  X  G  G.  This  is  the  orthogonal  projection  of  P(G)  onto  7(G).  The  key 
observation  by  Gross  and  Richards  that  links  the  practical  with  the  abstract 
is  their  Theorem  3.4  (presented  next),  which  relies  on  Schur’s  lemma.  It  is 
the  ordering  of  the  {m,}  in  this  theorem  that  establishes  the  ordering  of  the 
eigenvalues  in  the  density  function  used  for  theorem  70. 


Theorem  97  Let  d  >  0  and  m  =  (mi ,  •  ■  • ,  rur, )  with  mi  >  •  •  •  >  Tn„  >  0  and 


|m|  =  d.  Then 
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1.  The  restriction  E2m  of  ^  to  is  an  isomorphism  of  onto 

a  subspace  P'^{G)  of  /(G),  and 

2. 

I{G)2d  =  0  P^{G)  (G.30) 

|m|=ii 

is  the  decomposition  of  I{G)2d  into  irreducible  subspaces. 

Each  space 

P^{G)  =  E(P''’”(G))  (G.31) 

is  an  irreducible  right  invariant  subspace  of  /(G).  The  dimension  of  I{G)2d  is 

^  ^  degTTjrn  (G.32) 

Let  p  be  the  subrepresentation  of  R  on  the  subspace  /(G)  of  P(G).  Then 

p2m{o)  <P*{x)  =  E2m  T^2m{a)  (p*(x)  (G.33) 

P2m  is  the  irreducible  representation  of  signature  2m  that  acts  by  right  trans¬ 
lation  on  the  space  /^”*(G).  Note  that  this  looks  like  a  basis  change. 

The  most  important  fact  Gross  and  Richards  highlight  is  the  relationship 
between  /^”*(G)  and  P^’"(G).  Because  we  know  how  to  compute  ip  G  P^’”(G), 
we  can  find  the  corresponding  <p*  G  P”'{G). 

Let  P(5')  be  the  algebra  of  all  polynomials  on  the  set  of  Hermitian  matrices 
S  =  {x  =  x^}.  Let  Pd(S')  be  the  subspace  of  P(5')  of  polynomials  homoge 
neous  of  degree  d.  Let  P(5)  use  the  same  inner  product  used  on  P(G). 
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Gross  and  Richards  define  a  different  representation  on  S,  selected  to  pre¬ 
serve  the  structure  of  the  argument  of  9  6  PC*?).  Let 

T{a)q{s)  =  q{a^sa)  (G.34) 

for  a  €  G,  s  £  S.  Note  that  equation  G.34  is  slightly  different  in  form  than 
the  one  presented  by  Gross  and  Richards  in  their  equation  4.2(2).  When  the 
argument  of  r  is  restricted  to  K  then  t  is  unitary.  This  means 

<  T(k)qi(s),T{k)q2is)  >  =  <  qiis),q2is)  >  (G.35) 

Define  a  mapping  fi  :  /(G)  P(*?)  by 

p{x)  =  q{x^x)  (G.36) 

for  p  €  /(G)  and  q  €  ?(*?)•  Note  that 

Dp(a)p(x)  =  r(a)Dp(x)  =  q{a^x^xa)  (G.37) 

for  all  a  £  G.  We  also  can  see  ft  :  I{G)2d  ^d{S).  ft  is  an  isomorphism.  If 
we  call  the  restriction  of  ft  to  P'^{G)  by  the  notation  Dzmi  then 

P'^iS)  =  n2miI^"'{G))  (G.38) 

and 

r2m{a)  q{s)  =  ft2m  p2m{a)  ^2m  ?(«)  (G.39) 

T2m  is  the  irreducible  representation  of  G  with  signature  2m  acting  in  subspace 
P"'(5)  of  Pct{S),  and  we  get 

P4S)  =  ©  P"(S) 

|m|=d 


(G.40) 
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and 

T  =  0  T-2m  (G.41) 

|m|=<i 

If  we  let 

qra{s)  =  A(6)'""  n  =  q>2r.{x)  (G.42) 

j=l 

where  s  =  |m|  =  d,  then  qm  is  homogeneous  of  degree  d  and 

T{cu)qm  =  Ii2m{c)qm 

Note  that  qm  6  P”‘(5).  It  is  the  highest  weight  vector  oi  T2m- 

Since  we  have  an  isomorphism  between  P”*(5')  and  /^"*(G),  we  know  there 
exists  some  /f-invariant  polynomial  in  P’"{5).  Let 

fm{s)=  /  qm{k^sk)dk  (G.43) 

J  K 

This  is  similar  to  the  spherical  transformation  we  did  earlier.  This  fm  is  unique 
up  to  constant  multiples  in  P'"(5).  Note  that 

Uk^sk)  =  Us)  (G.44) 

for  all  s  G  5  and  k  £  K.  Gross  and  Richards  point  out  that  since  any  Hermitian 
matrix  s  can  be  diagonalized  by  some  element  k  E  K,  then  fm  is  uniquely 
determined  by  its  restriction  to  diagonal  matrices  in  5,  which  all  have  real 
entries.  As  a  polynomial  in  the  n  diagonal  entries  which  is  homogeneous  of 
degree  d,  fm  is  invariant  under  the  action  of  the  symmetric  group  on  n  letters. 
The  dimension  of  the  space  of  such  polynomials  is  the  number  of  partitions 
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m  of  the  positive  integer  d.  There  can  be  only  one  linearly  independent  K- 
in variant  member  of  P”*(5). 

Definition  8  A  non-zero  K-invariant  polynomial  fm  in  P'^(S)  is  called  a 
zonal  polynomial  of  S  of  weight  m. 

These  zonal  polynomials  of  different  signatures  m  form  a  basis  {fm}  for 
the  set  of  A'-invariant  polynomials  on  5,  P(S’)^ .  Thus 

PiSf  =  ®Cmfm  (G.45) 

m 

Gross  and  Richards  use  the  convention  by  workers  in  analysis  that  constants 
are  subsumed  into  a  general  constant  at  each  step  of  a  derivation  without 
indexing  or  other  distinguishment.  Gross  and  Richards  point  out  that  zonal 
polynomials  corresponding  to  different  signatures  are  orthogonal.  They  are  in 
different  subspaces. 

Recall  that  I  used  a  definition  for  inner  product  different  than  Gross  and 
Richards.  They  used  the  properties  of  inner  products  to  demonstrate  some 
important  properties  of  zonal  polynomials  in  their  Lemma  5.2.  For  this  reason, 
a  proof  of  Lemma  5.2  is  given  with  the  new  inner  product.  Their  Lemma  5.2 
remains  valid. 

Theorem  98  For  any  d>0, 

(trs)‘'=  Clmfmis) 

|m|=d 


where 


«m  =  (^!)||/„.ir^>0 


for  all  m.  This  is  a  modified  Gross  and  Richards  Lemma  5.2. 


Proof,  (tr  s)**  is  homogeneous  of  degree  d,  so  it  is  a  linear  combination  of 
all  fm{s)  for  |m|  =  rf,  and  thus 


(trs)‘'=  amfm{s) 

\m\=d 

for  some  suitable  choices  of  a^- 

By  definition  of  the  exponential  function,  we  know 

OO  1  1 

exp[trs]  =  S  =  53  ^  E  ocnfn{s) 

d=Q  d=0  “•  |n|=d 

Consider  the  inner  product  <  >  . 


OO  1  2 

—  53  ^  ^  fmtfn 


d=0  |„|=d 


where  d=  |m| . 


(G.46) 


(G.47) 


<  /m(5),exp[trs]  >=  lfm{s),  £4  53  f^n/nis)^  (G.48) 

\  d=0  “•  |n|=d  / 


Suppose  we  normalize  the  coefficients  of  polynomial  fm  so  that 


=  1 


(G.49) 


If  we  do  this,  then 


a^  =  d!||/,„r=*>0 


(G.50) 


We  know  Om  >  0  because  ||/m||  >  0  unless  fm  =  0  for  all  s,  and  d!  ^  O.o 
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Definition  9  We  define  Zm  to  be  a  zonal  polynomial  with  a  different  normal¬ 
ization,  so  that  Zm  =  Olmfm  where  Otm  =  d\  ||/m|r^- 

This  means 

{tTsY=  Zm(s)  (G.51) 

|m|=(f 

and  also 

Zmik^sk)  =  Zmis)  (G.52) 

which  means  that  Zm  is  a  function  only  of  the  eigenvalues  of  s. 

The  last  result  needed  from  Gross  and  Richards’  paper  is  their  Proposition 
5.5. 

Proposition  41  For  any  s,t  £  S, 

I  Z„{sk-'tk)dk  = 

where  dk  is  the  normalized  Haar  measure  on  K . 

The  integral  is  known  as  the  “splitting  property”  for  zonal  polynomials.  In 
K  we  know  since  k^k  =  I  that  =  k^ .  The  integral  can  be  written  as 

I  Zmisk-Hk)dk  (G.53) 

The  proof  is  done  by  showing 


(G.54) 
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where  p  =  Z{x^x).  Note  in  following  Gross  and  Richards’  proof  that  all  func¬ 
tions  in  are  also  left  /('-invariant  as  well  as  right  /f-invariant.  Thus,  as 

a  function  of  x,  then  fjiP{ykx^)  dk  is  a  left  A'-invariant  element  of 

This  splitting  property,  for  the  case  of  complex  variables,  fills  in  the  steps 
that  justify  James  equation  (92)  [120],  and  is  the  complex  analog  of  his  equa¬ 
tion  (23).  The  Zm  of  Gross  and  Richards  is  the  Cm  of  James. 


Appendix  H 
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SOME  GROUP  THEORY 

The  purpose  of  this  appendix  is  to  provide  a  minimal  background  for  concepts 
from  group  theory,  group  representation  theory,  and  topological  group  theory 
necessary  to  follow  the  material  used  in  development  of  zonal  polynomials. 
This  body  of  theory  is  not  in  the  usual  preparation  of  acousticians,  engineers, 
or  statisticians,  yet  the  future  of  acoustic  signal  processing  is  grounded  in 
these  concepts.  The  material  presented  here  is  barely  enough  to  provide  some 
basic  definitions.  Fluent  use  of  these  concepts  requires  a  2-3  course  graduate 
sequence.  With  judicious  topic  selection,  an  applied  graduate  course  could 
be  constructed  for  engineers  that  could  be  learned  in  one  semester.  We  con¬ 
clude  with  an  example  which  establishes  some  group  invariance  properties  of 
the  vector  complex  normal  distribution  which  justifies  our  use  of  the  zonal 
polynomial  approach. 

H.l  Basic  Group  Theory 

Definition  10  A  group  G  is  a  set  G,  together  with  an  operator  □,  that  obeys 
the  rules  below. 

1.  □  is  a  binary  operator  such  that  if  a  G  G  and  6  G  G,  then  G  G. 
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2.  □  is  an  associative  operator.  If  a,  b,  c£  G,  then 


(aa6)Dc  =  an(6Dc) 

Thus,  it  makes  sense  to  write  aObOc. 

3.  There  is  an  element  Ca  ^  G  such  that  for  all  a  G  G  we  have 

aOeo  =  epDfl  =  a 


It  is  a  theorem  of  group  theory  that  there  is  only  one  such  element  in  G. 
We  call  this  element  the  identity  element  of  (G,  □). 

4.  For  each  element  a  €:  G,  there  is  an  element  6  €  G  such  that 

aD6  =  bOa  =  cp 

It  is  a  theorem  of  group  theory  that  for  any  element  a  E  G,  there  is  only 
one  element  6  €  G  for  which  this  is  true.  We  call  b  the  inverse  of  a,  and 
we  write  b  =  a“^.0 

To  remind  us  of  this  association,  we  can  denote  it  by  (G,  0)3.  The  subscript 
g  identifies  (G,  □)  as  a  group.  When  the  context  is  unambiguously  referring 
to  the  group,  the  notation  may  be  simplified  to  (G,  □),  or  simply  G.  The 
symbol  □  was  chosen  to  decouple  our  normal  concepts  of  operators  so  that  we 
can  more  easily  think  of  general  operators.  The  concepts  of  a  group  extend 
well  beyond  our  usual  addition  of  real  numbers,  or  multiplication  on  the  set 
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of  real  numbers  when  the  zero  element  is  removed.  You  may  choose  any  other 
symbol,  and  the  rules  still  apply.  Another  definition  is  important  to  this  thesis 
is  that  of  a  subgroup. 

Definition  11  Let  (G,  □)  be  a  group.  Then  {H,  □)  is  a  subgroup  of{G,  □)  if 
H  C  G  and  i/  (//,  □)  is  a  group. 


H.2  Group  Representation  Theory 

H.2.1  Group  Representation  Definition 

Definition  12  A  representation  of  group  (G,  o)  is  some  other  group  {B.,0) 
related  to  (G,  o)  by  some  homomorphism  tp.  Thus,  p{g)  is  a  representation  of 
geGif 

p{goh)  =  ip{g)Op{h) 

for  all  g,h  ^  G,  where  p  :  G  B,  (p{g)  G  B,  and  p{h)  G  B. 

Let  I  €  B  he  the  identity  element  in  (J9,  □),  and  let  e  G  G  be  the  identity 
element  in  (G,  o).  Then 


Tig)  =9(eoy)  =  T{e)T{g) 


Tig)T  \g)  =  J  =  Ti^)°T{g)°T  ‘(ff)  =  V’(e) 
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Therefore  I  =  ^p{e)  means  that  the  identity  in  (G,  o)  maps  to  the  identity  in 
{B,  □).  A  related  result  is 

I  =  ip{e)  =  <f{g  09-^)  =  ‘p{9)Oip{g-'^) 

^~\9)  =  =  ^~H9)°‘f{9)°‘P{9~^)  =  <f{9~^) 

Therefore  <^“^(^)  €  5  is  mapped  from  g~^  €  G. 

H.2.2  Homomorphism  Familiar  Examples 

A  common  example  of  a  group  homomorphism  is 

exp(a:  +  y)  =  exp(x)  •  exp(T/)  (H.l) 

In  this  example,  x  and  y  belong  to  the  set  of  complex  numbers  which  forms 
a  group  structure  using  ordinary  addition  as  the  group  operator.  The  set  of 
numbers  {e^}  are  complex  numbers  without  zero,  which  forms  a  group  under 
ordinary  multiplication.  In  this  example,  (f{x)  =  e^.  Saracino  (pp. 106-108) 
[231]  gives  other  examples. 

Let  (A,  o)  be  the  group  of  invertible  square  matrices  of  complex  numbers, 
GL{n,C).  Then  det(a)  defines  a  group  homomorphism  from  (A,  o)  to  the  set 
of  complex  numbers  (except  zero)  under  multiplication.  Note  that 

det(ai  •  02)  =  det(ai)  x  det(a2)  (H.2) 

Let  (5, 0)  be  the  group  of  square  matrices  of  complex  numbers  with  matrix 
addition  as  the  group  operator.  Then  tr(6)  defines  a  group  homomorphism 
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from  (5,0)  to  the  set  of  complex  numbers  (including  zero)  under  addition. 
Note  that 

tr(6i  ©  62)  =  tr(6i)  +  tr(62)  (H.3) 

H.2.3  Homomorphism  Theorems 

The  structure  imposed  by  group  homomorphisms  permit  many  powerful  and 
useful  insights.  A  listing  of  theorems  (without  proof)  from  Saracino  [231]  is 
presented  below. 

Theorem  99  Let  (p  :  G  —*  H  and  :  H  K  be  homomorphisms.  Then 
ipo(p  :  G  —*  K  is  a  homomorphism,  where  o  is  composition  of  functions.  This 
is  Saracino  theorem  12.1(i). 

Theorem  100  Let  p  :G  —*  H  be  a  homomorphism.  Then 

1.  <p{ea)  =  Gff  where  ea  and  e//  are  the  identity  elements  in  groups  G  and 
H. 

2.  For  any  x  £  G  and  any  integer  n,  then  <,p(x")  =  [</?(x)]". 

3.  The  notation  0(1)  =  n  means  the  order  of  element  x  is  n.  This  means 
x”  =  ea-  If  o(x)  =  n,  then  o[(^(x)]  divides  n. 


This  is  Saracino  theorem  12.4. 
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Theorem  101  Let  <p  :  G  K  be  a  homomorphism.  If  H  is  a  subgroup  of  G, 
then  is  a  subgroup  of  K . 

<p{H)  =  {k  S  K  \  k  is  ^{h)  for  some  h  €  H} 

This  is  Saracino  theorem  12.6(i). 

Definition  13  Let  H  be  a  subgroup  ofG.  Then  H  is  called  a  normal  subgroup 
if  ghg~^  G  H  for  every  g  €  G  and  for  every  h  £  H.  We  denote  this  by  H  <  G. 

Definition  14  An  automorphism  is  a  homomorphism  (p  :  G  G  that  maps 
a  group  back  onto  itself. 

Theorem  102  A  subgroup  H  of  a  group  G  is  characteristic  if  (p{H)  C  H  for 
every  automorphism  (p  of  G.  Every  characteristic  subgroup  is  normal.  (The 
converse  is  false.)  This  is  Saracino  problem  12.23. 


Definition  15  If  H  <i  G,  then  Gj H  denotes  the  set  of  right  (=lcft)  cosets  of 
H  in  G. 


GjH  =  {Ha\a£G] 


(H.4) 


where 

Ha  =  {ha\he  H}  (H.5) 


Definition  16  If  ip  :  G  K  is  a  homomorphism,  then  the  kernel  of  ip  is 


ker(v?)  =  <p  =  {i?  G  G  |  <p{g)  =  ck] 
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Theorem  103  For  any  homomorphism  (p  :  G  —*  K,  then 

ker(<^)  <1  G 

This  is  Saracino  theorem  13.1. 

Theorem  104  (Fundamental  Theorem  on  Group  Homomorphisms) .  Let  p  : 
G  K  be  a  homomorphism  from  G  onto  K.  Then  K  is  isomorphic  to 
(j/ker(<^).  IVe  use  the  notation 

K  =  G/ker(<^)  =  {ker(t^)a  |  a  6  G} 

This  is  Saracino  theorem  13.2. 

Theorem  105  (Second  Isomorphism  Theorem).  Let  H  and  K  be  subgroups 
of  G,  and  let  K  <1  G.  Then 

H/{H  n  K)  ^  HK/K 
This  is  Saracino  theorem  13.4- 

Theorem  106  (Third  Isomorphism  Theorem).  Let  H  <  I\  <i  G  and  H  <  G. 
Then 

(A7//)  <d  (G///) 

and 

{GIH)I{KIII)^{GIK) 


This  is  Saracino  theorem  13.5. 
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H.2.4  Transformation  Groups 

We  are  now  ready  to  discuss  groups  of  functions  that  transform  members  of 
some  space.  We  call  such  groups  by  the  name  transformation  groups.  This 
material  is  taken  from  Wijsman’s  monograph  [288]. 

Wijsman  (p.l5)  assigns  a  technical  meaning  to  the  word  “action”. 


Definition  17  An  action  of  group  G  on  arbitrary  space  A  to  the  left  is  any 
function 

:  G  X  A^-^  A 

with  the  following  properties: 


1.  For  every  g  €  G,  tl:{g,  •) :  A  — »  /I  is  bijective. 


2.  a.)  =  a  for  every  a  G  A. 

3. 


'^{92-,i’{gua))  =  xf{g2g\,a) 
for  every  51,^2  €  G  and  a  G  A. 


Definition  18  If  ga  —  a  for  every  g  £  G  and  a  E  A,  then  the  action  of  G  is 
said  to  be  trivial . 


With  the  previous  discussion  in  mind,  let  us  examine  the  properties  of  the 


mapping  0(5,0)  =  g[a]  =  ga. 
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1.  The  range  of  ^  is 

=  {ij}{g,a),a  €  /!}  =  {ga,a  e  A}  =  {a,a  e  A}  =  A 

Therefore,  ip  is  an  onto  (surjective)  function.  Suppose  ip{g,  oi)  =  ip{g,  02). 
Since  ip{g,  a)  =  ga  =  a  for  all  a  €  A,  we  know  that 

ip{g,ai)  =  oi  =  02  =  ip{g,a2) 


for  any  ai,a2  €  A.  Therefore,  ip  is  one-to-one  (injective).  Since  ip  is 
one-to-one  and  onto,  we  call  it  bijective. 

2.  ip{e,a)  =  ea  =  a  for  all  o  €  A  where  e  =  (/m,/n)  is  the  group  identity 
element.  We  do  not  have  to  recompute  for  e  €  G  since  we  already 
established  ip{g,  a)  =  a  for  all  g  €  G,  which  includes  g  =  e. 


3. 


V’(fl'2,V’(fi'i,a))  =  ^{92, a)  =  a£  A 


Then 


^{9x92,0.)  =  ipig^.a)  =  a 


since  gig2  =  93  £  G.  Therefore 


0(fl'2,V’(Sfi,a))  =  fp{9i92,a) 

We  thus  declare  that  ip{g,a)  =  ga  is  an  action  of  G  on  A  to  the  left,  and 


this  action  is  trivial. 
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Definition  19  For  a  given  action  ofG  on  A,  the  orbit  of  a  ^  A  is  defined  as 

Ga  def  {ga  :  g  £  G} 

This  defines  a  partitioning  of  A.  For  our  situation,  Ga  =  {a}.  Each  partition 
contains  only  one  point,  a. 

Definition  20  The  abstract  space  whose  points  are  the  G-orbits  is  called  the 
orbit  space  under  G,  and  denoted  A/G.  In  our  case, 

A/G  =  {Ga,a  6  y4}  =  {a,  a  6  A}  =  y4 

Definition  21  The  orbit  projection,  tt  :  A  t-*  A/G,  assigns  each  a  ^  A  its 
orbit.  7r(a)  =  Ga  =  a  where  a  £  A. 

Let  B  C  A  and  g  £  G.  Then 

gB  def  {ga,a  ^  B] 

defines  the  g-translate  of  B.  For  our  case,  gB  =  B. 

Definition  22  The  saturation  of  B  is 

GB  =  {gB:g£  G) 

For  our  case,  GB  =  B. 

Definition  23  A  set  B  C  A  such  that  gB  =  B  for  all  g  £  G  is  called  invari¬ 


ant.  It  coincides  with  its  saturation. 
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Definition  24  Define  the  isotropy  subgroup  stability  subgroup)  of  G  at  a 
for  arbitrary  a  E  A  to  be 

Ga  def  {g  ^  G  :  ga  =  a) 

In  our  case,  Ga  =  G. 

Definition  25  In  addition  to  G  and  A,  suppose  we  have  space  C  and  a  func¬ 
tion  (fi  :  A  —*  C.  For  g  ^  G,  the  g  —  translate  of  9?,  written  gg>,  is  defined 
by 

{gipfia)  def  <p{g~^a) 

for  all  a  Q  A. 

Since  G  is  a  group  and  g~^  €  G,  then  g~^a  =  a.  Thus 

=  <^(a)  (H.6) 

for  all  a  ^  A.  Thus  gip  =  (p  for  all  g  E  G.  Therefore,  p  is  invariant  under  left 
action  of  G  on  A,  since  each  a  €  -4  is  a  different  orbit. 

Definition  26  If  an  invariant  function  assumes  different  values  on  distinct 
orbits,  it  is  called  maximal  invariant.  This  says  for  our  case  that  if  p{ai)  ^ 
9(02)  for  ai  ^02,  then  p  is  a  maximal  invariant. 

Definition  27  A  function  v  :  A  C  is  called  equivariant  if 


flf(i/(a))  =  v{ga) 
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for  all  g  €  G  and  a  ^  A.  In  our  case,  v{ga)  =  v{a)  for  all  g  ^  G.  Recall 


{gv){a)  =  v{g  ^a)  =  v{a) 


(H.7) 


Let 


Therefore 


g{i/{a))  def  {gv)ia) 

giu{a))  =  u{ga) 


for  all  g  £  G  and  a  ^  A.  Thus  v  is  equivariant. 


H.3  Topology  and  Basic  Measure  Theory 

This  section  contains  a  brief  highlighting  of  the  main  points  of  measure  theory 
and  topology  as  preparation  for  the  introduction  to  topological  group  theory. 
The  source  for  this  material  is  Rudin  [230].  Mastery  of  these  concepts  is 
necessary  for  any  new  serious  work  in  signal  processing. 

H.3.1  Topology 

Definition  28  A  topological  space  {G,  r)  is  a  set  G  upon  which  a  collection 
of  subsets  T  of  G  is  defined  with  the  following  properties. 

1.  r  contains  the  empty  set  $  and  also  G. 

n 

2.  T  is  closed  under  finite  intersections.  For  G  t,  then  fl  ^ 

fc=i 
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3.  T  is  closed  under  arbitrary  unions.  For  {tq}  €  t  where  the  range  of 
index  a  is  possibly  uncountable,  then  U  ''‘a  €  t. 

a 

Definition  29  The  subsets  of  G  that  are  members  of  t  are  called  the  open 
sets  with  respect  to  t.  The  set  complement 

G\X  ^X^  =  Y 

of  any  set  X  E  t  is  called  a  closed  set. 

Definition  30  Let  (p  map  topological  space  {G,t)  into  topological  space  {B,  a). 
Then  p  is  called  continuous  if  <p~^{X)  €  r  for  any  X  E  cr,  where  p~^{X)  is 
the  preimage  of  X  under  the  mapping  <p. 

Definition  31  If  g  E  G  and  X  C  G,  where  X  is  not  necessarily  in  r,  then  X 
is  a  neighborhood  of  g  if  there  is  a  Y  E  t  such  that  g  E  Y  C  X. 

If  iG,T)  and  (5,  <t)  are  topologies,  then  {G  x  B,t  x  cr)  is  a.  topology. 
However,  note  that  it  is  possible  to  define  a  topology  v  on  G  x  B  where 
u  (f.  TX  a.  This  means  that  it  is  possible  to  define  a  mapping  that  is  continuous 
with  respect  to  i/,  but  which  is  not  continuous  with  respect  to  t  x  <t,  even 
though  the  domain  and  range  of  are  the  same  in  both  cases. 

Definition  32  Let  {G,t)  be  a  topological  space.  Let  K  C  G,  where  K  is  not 
necessarily  in  t.  Let  be  an  arbitrary  collection  of  subsets  of  G  such  that 
X  C\J  Ba-  Then: 

a 
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1.  The  collection  {Ba}  is  called  a  cower  of  K. 

2.  If  all  of  the  Ba  are  in  t,  then  the  collection  {5^}  is  called  an  open  cover 
of  K. 

3.  Let  the  index  set  be  a  subset  of  the  index  set  a,  such  that  A'  C  U  Bp. 

0 

Then  for  the  general  case,  {Bp}  is  called  a  subcover  of  K  with  respect 
to  {Ba}. 

4.  When  each  Bp  6  r,  then  {Bp}  is  called  an  open  subcover. 

5.  If  the  number  of  subsets  in  the  collection  {Bp}  is  finite,  then  that  collec¬ 
tion  is  a  finite  subcover,  and  is  a  finite  open  subcover  when  each  Bp  6  t. 

Definition  33  Set  K  is  called  compact  if  every  open  cover  of  K  contains  a 
finite  open  subcover  of  K. 

Definition  34  If  B  C  G,  then  the  closure  B  of  B  is  the  smallest  closed  set 
with  respect  to  r  that  contains  B.  Thus  B  C.  B  C  G. 

Definition  35  G  is  called  locally  compact  if  every  g  £  G  has  a  neighborhood 
B  whose  closure  B  is  compact. 

Definition  36  G  is  called  HausdorfF  if  for  all  g,h  £  G,  g  ^  h,  that  g  and  h 
have  disjoint  neighborhoods.  The  elements  g  £  G,  when  G  is  Hausdorff,  are 
called  separable. 


H.3.2  Measure  Theory 
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Definition  37  A  measurable  space  (G,A4)  is  a  set  G  and  a  collection  of  sub¬ 
sets  M  of  G  where  M  is  called  a  <T-algebra,  and  Ai  has  the  following  proper¬ 
ties. 

1.  Af  contains  the  empty  set 

2.  If  jB  €  Af,  then  the  set  complement  G\B  =  is  also  in  M. 

3.  Ai  is  closed  under  countable  intersections.  For  {S/t}fceN  £  Ad,  then 

f]BkeM 

ifceN 

Definition  38  Any  set  B  €  M  is  called  a  measurable  set. 

Definition  39  Let  map  measurable  space  (G,  Ad)  into  topological  space  {B,  a). 
Then  (p  is  called  a  measurable  function  if{p~^{x)  €  Ad  for  every  x  6  cr. 

Theorem  107  Let  rj  be  an  arbitrary  family  of  subsets  of  G.  With  the  empty 
set  $  and  the  subset  collection  rj,  construct  a  <7 -algebra  Ad.  Ad  is  called  the 
a-algebra  generated  by  g.  Then  there  exists  a  a-algebra  Ado  containing  tj  that 
has  fewer  subsets  of  G  than  any  other  constructed  Ad .  If  ^  is  the  family  of 
all  a-algebras  Ad  which  contain  g,  then  Ado  =  fl  (Ocneanu  [196]). 

Definition  40  Let  (G,  r)  be  a  topological  space.  Using  r,  which  can  be  consid¬ 
ered  an  arbitrary  collection  of  subsets  of  G,  generate  a  a-algebra  Ad.  This  Ad 
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is  called  the  Borel  tr-algebra,  B,  with  respect  to  r.  Subsets  that  are  contained 
in  B  are  called  Borel  sets.  If  if  is  a  measurable  function  where  the  a-algebra 
is  a  Borel  a-algebm,  then  ip  is  called  a  Borel  measurable  function. 

Proposition  42  The  composition  of  continuous  functions  yields  a  continuous 
function. 

Proposition  43  A  continuous  function  of  a  measurable  function  produces  a 
measurable  function. 

Definition  41  Let  (G,M)  be  a  measurable  .space  with  er-algebra  M.  Let 
(^,0, 1^)  be  a  field.  A  function  p  :  is  called  a  measure  if: 

1.  Let  0  be  the  identity  element  of  (f)  in  'I'lien  //(4>)  =  0  where  4>  is  the 
empty  set. 

2.  Let  {  be  a  countable  collection  of  disjoint  measurable  sets,  where 

the  index  k  takes  on  values  in  the  set  of  natural  numbers,  N  =  {0, 1 , 2,  •  •  • } . 
Then 

This  property  is  called  “countable  additivity”. 

Definition  42  Let  c  be  a  constant  in  T ,  and  let  1  be  the  identity  element  of 
□  in  T .  Let  0  and  □  be  arithmetic  addition  and  multiplication  on  the  sets  to 
be  di.scu.ssed.  When  T  =  [0,oo]  C  R,  then  p  is  ealled  a  positive  measure  or 
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(more  usually)  just  a  measure.  When  .^  =  [0, 1]  =  R  and  fi{G)  =  1,  then 
is  called  a  probability  measure.  When  G  is  a  locally  compact  Hausdorff  space 
and  M.  is  a  Borel  a -algebra,  then  p,  is  called  a  Borel  measure. 

Definition  43  Let  G  be  a  locally  compact  Hausdorff  space.  Let  a-algebra  A4 
contain  all  the  Borel  sets  in  G.  (Thus  B  G  M.)  Let  p  be  a  positive  measure. 
Then 

1.  For  A  C  G,  the  measure  defined  by 

p{A)  =  m{{p{U)  :  AcU,U  open} 

is  called  an  outer  measure  of  A.  A  measure  p  with  this  property  for  all 
A  £  M  is  called  outer  regular. 

2.  For  A  C  G,  the  measure  defined  by 

p{A)  =  sup{p{K)  :  K  C  A,  K  compact) 

is  called  an  inner  measure  of  A.  A  measure  p  with  this  property  for 
every  open  set  A  and  for  every  A  G  M.  with  p{A)  <  oo  is  called  inner 
regular. 

3.  If  jU  is  both  inner  regular  and  outer  regular,  then  p  is  called  regular. 

Definition  44  A  Radon  mejisure  on  G  is  a  Borel  measure  which  is  finite  on 
compact  sets,  outer  regular  on  all  Borel  sets,  and  inner  regular  on  all  open 


sets. 
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Definition  45  Let  G  be  a  locally  compact  Hausdorff  topological  group.  A  left 
Haar  measure  on  G  is  a  nonzero  Radon  measure  p  which  satisfies  p{gA)  = 
p{A)  for  all  g  E  G  and  for  all  open  A.  Similarly,  a  right  Haar  measure  satisfies 
p{Ag)  =  p{A). 

A  few  other  often  encountered  terms  in  measure  theory  are  defined  below. 

Definition  46  If  (G,M.,p)  is  a  measure  space,  then  a  set  B  E  M.  is  called 
a  null  set  if  p{B)  =  0. 

Definition  47  A  measure  p  whose  domain  M  contains  all  subsets  of  null  sets 
is  called  complete. 

Theorem  108  Let  G  =  R  and  let  M.  be  the  Borel  a-algebra  defined  on  G. 
Let  tp  :  G  —*  Tl  be  any  increasing,  right  continuous  function.  Then  there  is  a 
unique  measure 

p,^{{a,b])  =  p{b)  -  p(a) 

for  all  a,b  £  G.  If  y  is  another  such  function,  then  =  p„  if  and  only  if 
<p  —  V  is  a  constant.  This  is  Folland  Theorem  1.16  [85]. 

Definition  48  The  completion  of  this  measure,  when  <p{g)  =  g  for  all  g  £  G 
is  called  the  Lebesgue  measure,  and  this  measure  is  usually  denoted  by  m. 
Lebesgue  measure  on  R"  is  the  completion  of  the  n-fold  product  of  the  Lebesgue 


measure  on  R. 
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Definition  49  Now  we  return  to  measure  space  (G,A4,fi)  and  group  (^,0) 
with 

p  : 


1.  When 


T  —  {— oo}  U  R  U  {+CX)} 


and  ©  is  addition,  then  p  is  called  a  signed  measure. 


2.  When  ^  =  C,  the  set  of  complex  numbers,  with  0  being  addition,  then 
p  is  called  a  complex  measure. 

3.  When  1/^(<?)|  <  oo,  then  p  is  called  a  finite  or  bounded  measure. 


Definition  50  Let  M.  be  the  power  set  P(G)  of  G.  The  power  set  P{G)  is 
the  collection  of  all  possible  subsets  of  G. 

P(G)  =  {x  I  X  e  G} 

In  particular,  each  element  g  £  G  is  a  member  o/P(G).  Two  special  cases  are 
important.  Let  p{G)  =  /  where  g  €  G  and  f  ^  !F. 

1.  When  .F  =  N{0, 1,2,  •  •  •},  and  /  =  1  where  0  is  addition,  then  p  is 
called  a  counting  measure. 


2.  Let  p{go)  =  f  for  one  specific  go  €  G,  and  p{G)  =  E  ior  g  ^  G,  g  ^  go, 
where  f  ^  E,  f  ^  E.  When  .F  =  N,  /  =  1,  and  =  0,  then  p  is  called 
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the  Dirac  measure,  atomic  measure,  or  the  point  mass  at  qq.  The  group 
operator  ®  is  addition. 

H.4  Topological  Group  Theory 

This  discussion  is  taken  from  Folland  (p.  312  ff)  [85]  or  it  is  motivated  by 
Folland.  We  begin  with  a  definition. 

Definition  51  A  topological  group  {G,  o,  r)  is  a  group  {G,  o)  with  a  topology 
T  defined  on  G  such  that  the  group  operator  o  and  the  group  inverse  mapping 
g  I— ►  g~^  are  continuous  with  respect  to  t. 

Let  {G,  t)  be  a  topology.  Then  {Gy.G,T  xt)  is  also  a  topology.  Let  o  be  a 
mapping  o  :  GxG  G  where  o  is  the  group  operator  of  group  (G,  o).  Since  o 
is  a  continuous  operator,  then  for  any  D  €  r  we  know  that  its  preimage  under 
o  is  some  A  x  B  €  r  x  t.  The  awkward  formal  notation  is  o~^(D)  E  t  x  t  for 
any  D  E  t.  Because  we  carefully  constructed  r  x  r,  we  know  that  A  E  t  and 
B  E  T.  In  nicer  notation,  we  say  A  o  B  »  D  is  continuous  with  respect  to  t. 

Definition  52  Let  A,BcG  and  g  E  G.  Then 

1.  gA  =  goA  =  {goa\aE  A}. 


2.  Ag  =  Aog={aog\aEA}. 


701 

3.  A~^  =  {a“^  I  a  G  where  a~^  is  the  element  in  G  that  is  the  group 
inverse  of  a. 

4.  AB  =  Ao  B  =  {ao  b  \  a  ^  A^b  €  B}. 

Definition  53  If  A  CG  and  A  =  A~^,  then  A  is  called  symmetric. 

Note  that  A~^  is  the  set  of  group  element  inverses 

{g-'  \g€A}  def  A'' 

Note  that  the  conditions  A  C  G  and  A  =  A~^  are  not  sufficient  to  define  A  to 
be  a  subgroup  of  G.  For  example,  let  A  be  all  elements  of  G  except  the  group 
identity  element  e.  Then  A  C  G  and  A  =  A~^  and  yet  A  is  not  a  group.  Note 
that  AA~^  is  not  a  set  consisting  of  only  the  group  identity  element. 

AA-^  =  {aob-^  \  a,be  ACG} 

In  the  general  case  of  A  being  a  subset  of  G  (but  not  a  subgroup  of  G),  we 
cannot  guarantee  that  €  A.  Being  a  subset  is  different  than  being  a 

subgroup.  We  cannot  even  claim  that 

{o  o  6  I  a,  6  €  A)  (Z  A 

When  A  —  A~^,  then 

A-^  =  {a-*  \aeA}  =  A 
implies  G  A.  We  cannot  claim 


{a  o  6  '  I  a,  6  €  i4}  C 
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even  !‘'ough  {6“’  |  6  €  /!}  =  /I.  In  our  example  of  A  =  G'\{e},  when  a  =  b, 
then  a  o  =  e  ^  A. 

Some  basic  properties  of  topological  groups  are  given  by  Folland,  as  follows. 

Proposition  44  This  is  due  to  Folland  [85].  Let  G  be  a  topological  group. 
Then  for  (G,  o,  r)  : 

1.  The  topology  of  G  is  translation  invariant.  Thus,  if  U  G  r  and  g  G  G, 
then  g  o  U  £  T  and  U  o  g  ^  r. 

2.  For  every  neighborhood  A  of  e,  there  is  a  symmetric  neighborhood  B  of 
e  with  B  C  A. 

3.  For  every  neighborhood  A  of  e,  there  is  a  neighborhood  B  of  e  with 
BBC  A. 

4.  If  (//, o)  is  a  subgroup  of  (G, o),  then  so  is  its  closure  (/7,o). 

5.  Every  open  subgroup  (/4,o),  A  E  r,  of  (G,  o)  is  also  closed. 

6.  If  A  and  B  are  compact  subsets  of  G,  then  AB  \s  also  a  compact  subset 
of  G. 

I  have  lost  record  of  the  pedigree  of  the  following  definitions. 

Definition  54  L  el  (G,  o,  r)  and  ( //,  o,  be  topological  groups  having  the  same 
group  operatoro.  II  is  a  topological  subgroup  of  G  if  <t  C  t.  IJ.sually,  a  =  rfl//. 
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Definition  55  Let  H  be  a  topological  subgroup  of  G.  If  H  is  compact,  then 
H  is  a  compact  subgroup  of  G. 

Definition  56  Let  H  be  a  compact  subgroup  of  G.  Then  H  is  a  maximal 
compact  subgroup  of  G  if  there  is  no  other  compact  subgroup  A  of  G  that 
contains  H .  Note  that  G  can  possibly  have  more  than  one  maximal  compact 
subgroup. 

Definition  57  Let  a,b  E  G  be  fixed  elements.  Let  ip  be  a  continuous  function 
on  the  topological  group  G.  Let  ll-H  be  a  norm  on  this  space  of  functions.  Let 
g  ^  G  be  an  arbitrary  element.  Then 


1. 


{Laip)ig)  def  <p{a  ’  og) 


is  called  the  left  translate  of  (p  through  a. 


I^aob  —  LqL^ 


ip  is  called  left  uniformly  continuous  if  for  every  c  >  0  there  is  a  neigh¬ 
borhood  V  oi  e  such  that  —  ip\\  <  e  [ot  a  £  V  C  G,  where  e  is  the 
group  identity  in  G. 


2. 


def 
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is  called  the  right  translate  of  (p  through  a.  Raob  =  RaRb-  ‘fi  is  called  right 
uniformly  continuous  if  for  every  e  >  0  there  is  a  neighborhood  V  of  e 
such  that  \\Ra<p  —  ip\\  <  e  for  a  e  V  C  G. 

Proposition  45  If  <p  €  Cc{G),  the  space  of  continuous  complex-values  func¬ 
tions  (p{g)  on  G  with  compact  support,  then  ip  is  left  and  right  uniformly  con¬ 
tinuous. 

We  continue  with  Folland’s  discussion  of  the  Haar  measure. 


Proposition  46  Let  G  be  a  locally  compact  Hausdorff  topological  group.  Let 

def  {ip  €  Cc{G)  :  v’  >  0  and  \\p\\  >  0} 

Then 


1.  A  Radon  measure  /x  on  G  is  a  left  Haar  measure  if  and  only  if 

p.{A)  def  p{A~^),  A  C  G  open 
is  a  right  Haar  measure. 

2.  A  nonzero  Radon  measure  /x  on  G  is  a  left  Haar  measure  if  and  only  if 

j  pdp  =  j  Lapdp 

for  all  p  G  Cf  and  a  £  G. 

3.  If  /X  is  a  left  Haar  measure  on  G,  then  /x(l/)  >  0  for  all  nonempty  open 
U  C  G,  and  /  pdp  >  0  for  all  /  G  Cf. 
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4.  If  /i  is  a  left  Haar  measure  on  G,  then  n{G)  <  oo  if  and  only  if  G  is 
compact. 

Theorem  109  Every  locally  compact  Hausdorff  topological  group  possesses  a 
left  Haar  measure.  This  is  Folland  [85]  theorem  10.5. 

Theorem  110  If  p  and  v  are  left  Haar  measures  on  G,  then  there  exists  c  >  0 
such  that  p  =  cv.  This  is  Folland  [85]  theorem  10.  If- 

Proposition  47  Left  and  right  Haar  measures  are  mutually  absolutely  con¬ 
tinuous.  This  is  Folland  [85]  theorem  10.18. 

Theorem  111  Let  G  be  a  compact  Hausdorff  topological  group.  There  there  is 
a  unique  real-valued  function  I,  called  the  Haar  integral,  defined  for  continuous 
real-valued  functions  on  G,  such  that: 

!•  +  <^2)  =  I{Ti)  +  ^{^2)- 

2.  I{cip)  =  cl{<p),  where  c  G  R. 

3.  If  ip{g)  >  0  for  all  g  £  G,  then  I{(p)  >  0. 

4.  7(e)  =  1. 

5.  I{Ra<p)  =  I{<p)  =  I{La^)  for  all  a  £  G.  Often  the  notation  f  ^{g)dg  is 
used  for  I{(p).  For  example,  this  property  can  be  written  as 

J  ip{g  o  a)dg  =  j  ^{g)dg  =  j{a-^  o  g)dg 


This  is  Bredon  [43]  theorem  3.1. 


H.5  Matrix  Groups 
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There  are  some  standard  groups  in  the  theory  of  matrix  groups.  The  following 
were  taken  from  Curtis  [64].  You  are  already  familiar  with  the  notation  R  for 
the  set  of  all  real  numbers  and  C  for  the  set  of  all  complex  numbers.  The 
next  level  of  generalization  is  the  set  of  quaternions,  denoted  H.  For  the  sake 
of  generality,  we  let  K  £  {R,  C,H}. 

Definition  58  Mn{K)  is  the  vector  space  of  set  over  field  (/(',+,•).  It 
distributes  the  field  over  the  set  as 

b{x  +  y)  =  bx  +  by 

where  b  £  K  and  x,y  £  K'^.  This  is  called  an  algebra. 

Definition  59  If  A  is  an  algebra,  then  x  £  A  is  a  unit  if  there  is  some  y  £ 
A  such  that 

xy  =  yx  =  1 

I.e.,  X  is  a  unit  if  it  has  a  multiplicative  inverse.  If  A  is  an  algebra  with 
associative  multiplication  and  U  (Z  A  is  the  set  of  units  in  A,  then  U  is  a 
group  under  multiplication. 

The  group  of  units  in  the  algebra  consisting  of  n  x  n  matrices  in  field  K  is 
denoted  by  GL{n,K),  and  is  called  a  General  Linear  group.  Another  notation 
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Table 


H.l.  Notation  for  Types  of  Orthogonal  Groups 


Notation 

Group  Name 

0(n,R)  =  Oin) 

0(n,C)  =  U(n) 

0(n,H)  =  5p(n) 

Orthogonal  group 

Unitary  group 

Symplectic  group 

used  is  Mn{K)  =  GL{n,K).  Often,  when  the  notation  M„(-)  is  used,  it  refers 
to  the  set  of  nonsingular  n  x  n  matrices.  So,  when  K  =  C,  we  can  use  M„(C). 
Some  special  groups  are  the  class  of  orthogonal  groups  0{n,K),  defined 

by 


0(n,K)  =  {A  £  Mn{K)  [<  Ax,  Ay  >=<  x,y  >  for  all  x,y  £  A’"} 

When  A  G  0{n,K)  then  AA^  —  In  and  A^ A  =  /„.  Here,  the  notation  spe¬ 
cializes.  The  notation  for  orthogonal  matrix  groups  is  given  in  table  H.l. 

The  determinant  of  /I  €  0{n)  is  det  A  £  [—1, 1],  and  of  H  €  U(n)  is 

detB  £  {e'\e  £R] 


When  attention  is  restricted  to  those  cases  where  det  A  =  \  and  det  5  =  1, 
special  groups  are  formed.  Notation  for  these  special  groups  is  given  in  table 
H.2. 

Let  A  =  A^  £  M„(C).  Can  we  make  a  group  from  the  set  of  Hermitian 
matrices?  Let 


H  =  {A£  Mn{C)  \A  =  A^} 
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Table  H.2.  Special  Orthogonal  Groups 


Notation 

Group  Name 

{A  €  0(n)  1  det  A  =  1}  =  SO{n) 

{B  e  U(n)  1  det  5  =  1}  =  SU{n) 

Special  Orthogonal  Group 

Special  Unitary  Group 

Then  H  is  the  set  of  all  n  x  n  Hermitian  matrices.  (Recall,  H  is  the  set  of 
quaternions.  The  font  style  is  significant  in  the  notation.)  Let  us  see  if  {H\0,  •) 
is  a  group  where  the  operation  is  ordinary  matrix  multiplication.  The  set  H\0 
is  the  set  H  with  the  zero  matrix  removed.  Suppose 


Both  A  and  B  are  in  H\Q.  However, 


C  =  AB  = 


l  +  i  1 
I  —  i  —i 


is  not  in  H\0.  Thus,  {H\0,  •)  cannot  be  a  group  because  it  is  not  closed  under 
matrix  multiplication. 


Lemma  24  If  A  and  B  are  Hermitian,  then  AB  is  Hermitian  if  and  only  if 
AB  =  BA.  This  is  Nomizu  exercise  8.4-1  (p-  ^37)  [193]. 


Proof.  Let  A  =  B  =  B" .  Then 
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{ABf  =  B"A"  =  BA 


The  statement  that  AB  is  Hermitian  means  that  (AB)^  =  AB.  Thus  AB  is 
Hermitian  if  and  only  if  AB  =  BA,  or  equivalently  if  AB  —  BA  =  0,  which  is 
a  form  familiar  to  those  who  have  studied  Lie  Theory. 

To  further  explore,  suppose  we  consider  the  set  of  Hermitian  nonnegative 
defin'te  matrices.  From  Johnson  [124]  we  have  the  example 


B  = 


'  1  -3^ 


-3  10 


and 


AB  = 


^  -8  27  ^ 


-27  91 


Both  A  and  B  are  Hermitian  (symmetric)  positive  definite.  However,  AB  is 
not  definite,  positive  or  negative.  In  looking  at  x^ABx,  if  x\  >  then 
x^ ABx  <  0.  If  Xj  <  then  x^ ABx  >  0. 

Here  are  two  more  examples  of  structures  that  fail  to  form  a  group.  Let 
X  =  be  the  set  of  all  Hermitian  matrices  that  are  nonsingular.  Let 
A,B  6  X.  Now,  define  B  o  A  =  B^ AB.  The  operator  o  is  a  binary  operator. 


and  B"AB  €  X. 
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{B^ABf  =  B^A^B  =  B^AB 
Let  us  see  if  it  is  associative. 

Co{BoA)  =  Co  (B^AB)  =  C^B^ABC  =  CBABC  \i  B  =  B^ ,  C  =  C” 

{CoB)oA  =  (C^BC)  oA  =  C^BCAC^BC 
=  CBCACBC  if  B  =  B",  C  =  C" 

Thus 

C  o  (B  o  5^  (C  o  B)  o  yl 

and  therefore  (X,  o)  is  not  a  group  because  o  is  not  an  associative  operator. 
Similarly,  AOB  =  B^AB  is  not  a  group  because 

Ca{BOA)  =  Ca{A^BA) 

=  ABAC  ABA,  for  A=^A”,  B  =  B",  C  =  C" 

{CDB)nA  =  A^{CDB)A  =  A^{B"CB)A 

=  ABC  BA,  for  A  =  A^,B  =  B^,C  =  C" 

We  will  want  to  take  advantage  of  properties  of  topological  groups.  In 
preparation,  the  following  remarks  are  taken  from  Bredon  (p.5  ff)  [43].  The  fol¬ 
lowing  groups  are  topological  groups  with  the  relative  topology  from  M„(.F)  : 
GL{n,^),  SL{n,T),  0{n,!F),  and 


SO{n,T)  =  0{n,T)  n  SL{n,T) 
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where  the  field  T  may  be  taken  over  C  or  R.  U(n)  =  C>(n,C)  is  closed  in 
M„(C)  sa  C"^.  For  matrix  U  €  U(n),  is  a  continuous  function  of  U  and 
UU^  =  Thus  U(n)  is  bounded  in  Af„(C).  Thus  U(n)  is  compact.  Because 
SU{n)  is  also  a  closed  subgroup  of  U(n),  then  SU{n)  is  also  compact.  In 
fact,  Gross  and  Richards  (Section  1.3)[96]  remark  that  0{n,J^)  is  a  maximal 
compact  subgroup  of  GL{n,T). 

Let  A,B£  0{n,  K).  Then 

{ABf{AB)  =  {AB){AB)^  =  /„ 

Thus  (P(n,  K)  is  closed  under  matrix  multiplication. 


H.6  Group  Invariance  Property  of  the  Vec¬ 
tor  Complex  Normal  Distribution 

In  this  section  we  establish  a  group  invariance  property  of  the  vector  complex 
normal  distribution.  The  work  done  here  is  a  slight  generalization  of  that 
done  by  James  [120],  and  I  think  a  necessary  background  for  understanding 
his  paper  which  revolutionized  thinking  about  the  statistical  distribution  of 
sample  eigenvalues.  What  is  special  about  the  approach  given  now  is  the 
application  of  the  invariance  argument  to  the  distribution  rather  than  just 
some  factor  terms  of  the  density  function.  This  is  a  step  towards  incorporating 
a  measure-theoretic  approach  with  the  group  invariance  ideas  which  will  lead 
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to  an  ability  to  deal  with  a  larger  class  of  distributions  and  perhaps  to  an 
ability  to  incorporate  these  ideas  into  sensitivity  analyses. 

We  begin  with  some  abstractions.  We  are  going  to  define  a  set  G  whose 
elements  are  pairs  of  matrices.  A  special  operation  will  be  defined  which  will 
provide  a  rule  for  combining  one  set  element  with  another  set  element.  We 
will  then  see  that  this  set,  with  the  operator,  forms  a  group.  The  next  step 
will  be  to  define  a  set  A  upon  which  elements  of  G  will  act.  We  will  establish 
that  we  have  defined  a  transformation  group  which  justifies  our  use  of  the 
machinery  of  some  topological  group  theory.  We  then  apply  our  findings  to 
the  vector  complex  normal  density  function  to  study  invariance  properties.  In 
the  process,  we  also  see  the  distinction  between  a  mapping  and  a  change  of 
variables  crystalized.  This  section  is  really  the  link  between  the  application 
and  the  abstract  mathematical  work  required. 

H.6.1  Construction  of  a  Group 

This  work  is  supplied  by  me. 

Define  a  set  G  by 

G  def  {g  =  {L,U)\Le  Mm{C),UU^  =  /„}  (H.8) 

and  a  binary  operator  o  by 

92  °  9i  def  (Tj,  o  (7/1,  f/i)  def  (Z-27/i,  f/if/2) 


(H.9) 
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The  identity  element  is  e  =  (/m,7n)-  The  inverse  element  g~^  of  g  is  given 
by  g~^  =  Then  the  claim  to  be  tested  is  that  (G, o)  is  a  group. 

To  show  this,  we  must  show  that  o  is  a  binary  associative  operator,  that  with 
this  operator  there  is  an  associated  element  e  in  G  that  is  an  identity  element, 
and  that  each  element  in  G  has  an  inverse  element  in  G  associated  with  this 
operator. 

The  operator  o  was  defined  to  be  a  binary  operator  between  group  elements 
gi  and  ^2-  We  now  examine  if  o  is  associative,  as  implied  by  the  claim  that 
(G,  o)  is  a  group.  To  do  this,  we  have  to  show 

(53  0P2)  =9z0  (^2  0^1) 

for  arbitrary  elements  g\,g2,gz  in  set  G.  We  apply  the  definitions  and  observe 
the  results. 

{gz  0^2)  OS'!  =  ((T3, C^)  o  {L21U2))  o  {L\,U\) 

=  {L3L2,U2U3)  o{Li,Ui)  =  {L3L2Lr,UiU2U3) 
gz  o  ig2  ° gi)  =  (^3,^/3)  o  ((^2,^/2)  o  {Li,Ui)) 

=  (L3,  U3)  O  {L2L1,  U1U2)  =  {L3L2L1,  U1U2U3) 

From  the  equality  of  the  two  approaches,  we  can  conclude  that  o  is  an  asso¬ 
ciative  operator. 

We  now  examine  the  element  e  to  see  if  it  really  is  an  identity  element.  To 
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do  this,  we  must  show  that  g  o  e  =  eo  g  =  g. 

goe  =  {L,U)o  4)  =  (Llrr,,  hU)  =  (L,  U)  =  g 

eog  =  (4,  /„)  o  {L,  U)  =  (ImL,  Uln)  =  (4  U)=g 

From  this  we  can  conclude  that  e  =  (7^,  7n)  is  the  group  identity  element. 

Next,  we  verify  that  each  element  of  G  has  an  inverse  which  is  also  a 
member  of  G.  To  do  this,  we  must  show  that  for  an  arbitrary  g  E  G  that  there 
is  an  element  g~^  E  G  such  that  g  o  g~^  =  e. 

JOJ-'  =  (!,£/)  o  (i-M/")  =  =  (/„,/„)  =  e 

where  UU^  =  In  implies  U~^  =  ,  which  in  turn  implies  U^U  =  1. 

g-^og=:  iL-\U^)  o  {L,  U)  =  (T"*!,  UU»)  =  {u,  h)  =  e 

Thus  each  element  g  has  an  inverse.  We  conclude  that  we  have  a  group. 

By  the  theorems  from  group  theory,  we  know  that  e  is  unique,  and  that 
for  each  element  g  E  G  that  the  associated  g~^  E  G  is  unique. 

H.6.2  Action  Set  Definition 

This  work  is  supplied  by  me. 

We  next  define  a  set  A  and  a  rule  for  operating  on  A  by  elements  of  group 
(G,  o).  Let  A  be  the  set  of  all  multivariate  complex  normal  probability  distri¬ 
bution  (or  measures),  Pz(/z,E)  where  =E  >  0.  Denote  the  multivariate 


715 


random  variable  by  Z,  the  mean  by  /i,  and  the  covariance  matrix  by  E.  You 
can  consider  (/x,  S)  as  an  index  that  selects  a  particular  distribution  from  set 
A.  Define  the  action  on  elements  of  A  by  elements  of  group  G  by  the  set  of 
simultaneous  mappings  gffa],  where  a  €  v4  is  one  specific  Pz(//,S),  by 

(I,  U)[Pz{ti,  S)]  =  PLzuiLfiU,  LEL»)  (H.IO) 


The  traditional  expression  of  this  action  is  the  set  of  simultaneous  mappings 

Z  ^  LZU  =  Y 

\  (H.ll) 

S  LET"  =  5 


The  density  function  exists  when  E  is  nonsingular.  For  Z  ~  CNp^n{fj,,  E,  7)  we 
have  by  theorem  51 


dF^iZ)  =  etr(-(Z  -  -  ^)]{dZ)  =  a  (H.12) 

This  represents  just  one  element  a  ^  A. 


H.6.3  Invariance  Demonstration 

This  work  is  supplied  by  me. 

Now,  pick  an  arbitrary  element  g  —  {L,U)  ^  G.  When  we  apply  ^  to  a  we 
are  picking  a  new  element  b  ^  A.  Note  that  this  is  a  mapping,  not  a  change 
of  variables. 

b=g[a\  =  (L,V)[dF;(Z)\  =  -^etrl-{y'  -  M)«S-'{Y  -  M)](dY) 


(H.13) 
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1 


ir^p\LEL»\ 


^  etr[-(y  -  LfiUr{LEL^)-\Y  -  LnU)]{dY) 


1 


7r"P|Sr|Lp” 

X  etT[-U^iL-^YU-^  -  fif  L-^ L{L-^YU-'^  -  lx)U]{dY) 

= - i ^  etT[-(L-^YU-^  -  fifY.-^L-^YU-'^  -  fi)]{dY)  (H.14) 

x"p|Sr|I|"”  ^  ^  ;  V  j 

This  is  the  new  point  b  £  A.  We  wish  to  investigate  the  invariance  properties 

associated  with  this  mapping.  To  be  able  to  compare,  we  now  change  variables 

from  Y  to  Z,  while  remaining  at  the  location  b.  Let 


Y  =  LZU 


(H.15) 


By  theorem  34  we  know  the  Jacobian  of  this  transformation  is 
J{Y  -^Z)  =  |det  L|^"  |det  f/p'’  =  |det  Lp" 
since  UU^  =  I  implies  |dett/p'’  =  1.  So, 

6  = - - j-x 

7r"P|Sr|Tp" 

X  etr[-(L-‘LZt/t/-'  -  /x)"S-‘(L-‘LZt/t/-*  -  /i)] \JiY  ->  Z)|  (dZ) 

=  etrI-(Z  -  -  „))  lir  {dZ)  (H.16) 

=  etr[-(Z  -  p)«S-'(Z  -  =  “  (H.17) 

Thus,  A  is  invariant  under  action  by  group  (G,  o).  We  have  shown  that  for 
any  a  £  A  that  ga  =  a  for  all  g  £  G. 


Appendix  I 
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ELEMENTARY  HILBERT  SPACE 

THEORY 

This  is  a  very  terse  review  of  the  fundamental  definitions  of  Hilbert  space 
theory.  The  following  definitions  are  taken  from  Rudin  [230],  slightly  modified 
to  fit  our  applications. 

Definition  60  A  complex  vector  space  H  is  called  an  inner  product  space  ( or 
unitary  spacej  if  to  each  ordered  pair  of  vectors  x  and  y  in  H  there  is  associated 
a  complex  number  <  x,y  >,  called  the  inner  product  or  scalar  product  of  x 
and  y  such  that  the  following  rules  hold. 

(a)  <  y,  X  >  =  <  X,  t/  >•  .  The  asterisk  denotes  complex  conjugate. 

(b)  <  2,  X  +  y  >  =  <  2,  X  >  +  <  2,  y  >  if  X,  y,  and  2  are  in  H. 

(c)  <  X,  ay  >  =  a  <  X,  y  >  if  X  and  y  are  in  H  and  a  is  a  scalar. 

(d)  <  X,  X  >  >  0  for  all  x  in  H. 

(e)  <  x,x  >=  0  only  if  x  =  0. 

Note  that  (b)  and  (c)  say  that  <  x,y  >  is  linear  in  the  second  argument. 
This  is  the  change  I  have  made  from  the  usual  mathematician’s  definition.  The 
reason  for  doing  this  is  to  be  able  to  use  the  natural  notation  of  the  Hermitian 
transpose  in  the  definition  of  an  inner  product.  For  example,  <  x,y  >  =  x^y 
is  a  valid  inner  product. 
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Definition  61  Define  the  inner  product  space  norm  of  x  to  be 

Ikllw  =< 

using  the  non-negative  square  root. 

Definition  62  Define  the  distance  (or  metric j  between  x  and  y  to  be 

dHix,y)  =  ||x-y||^ 

Then  all  the  axioms  of  a  metric  space  are  satisfied.  Thus,  our  inner  product 
space  H  is  now  also  a  metric  space. 

Definition  63  If  every  Cauchy  sequence,  using  dfj{x,y),  converges  in  H ,  then 
this  metric  space  is  complete.  When  this  is  true,  our  complete  inner  product 
space  H  is  called  a  Hilbert  space. 

Example  7  Vectors  in  C"  with  inner  product  <  x,y  >=  x^y  form  a  Hilbert 
space. 

Example  S  If  p  is  any  positive  measure,  L‘^{p)  is  a  Hilbert  space  with  inner 
product  <  f,g  >=  fxf*gdp.  Note  that 

II/IIh  =</,/>■'"=  {l\!\^d^^)'>^  =  \\!h 


Appendix  J 
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COMPLEX  VECTORS 

In  this  section  we  will  define  a  vector  space  and  by  way  of  an  example  show 
some  of  the  problems  associated  with  treating  a  vector  space  in  n-dimensional 
complex  numbers  C"  as  a  vector  space  in  2n-dimensional  real  numbers 
We  first  define  a  field  and  then  a  vector  space. 

J.l  Abstract  Field 

A  vector  space  requires  a  special  kind  of  group  called  an  abelian  group.  An 
abelian  group  is  a  group  that  also  obeys  the  following  rule  (Group  Rule  number 
5). 

5.  For  all  a  €  G”  and  b  £  G,  then  aOb  =  bOa.  The  order  of  the  elements 
is  not  important.  When  this  is  true,  we  call  the  operator  □  a  commutative 
operator.  We  say  that  a  and  b  commute  under  □. 

Let  G/{ea}  denote  the  set  G  with  the  element  cq  removed.  Now,  equip 
G/{eo}  with  an  operator  o  such  that  {G f {eo]iO)g  is  an  abelian  group.  We 
now  have  defined  two  groups,  (G, □)  and  (G/{eD},o).  When  we  glue  these 
two  groups  together  using  rules  (6)  and  (7)  below,  we  get  a  field.  We  denote 
this  hy  =  (G,  0,0)  when  the  context  is  clear.  Group  Rules  (6)  and  (7)  are 


called  the  distributive  laws. 


720 


6.  For  all  a,b,c^  G,  then  a  o  (bOc)  =  (a  o  6)D(a  o  c). 

7.  For  all  a,b,cG  G,  then  (60c)  o  a  =  [bo  a)n(c o  a). 

We  are  familiar  with  two  conunon  fields:  (i)  the  field  over  the  real  numbers 
(R,  +,  •)  where  +  is  ordinary  addition  and  •  is  ordinary  multiplication,  and 
(C,  +c,  -c)  where  +c  is  complex  addition  and  ^  is  complex  multiplication. 


J.2  Abstract  Vector  Space 

A  nonempty  set  V,  together  with  operators  A  and  *  is  said  to  be  a  vector 
space  over  a  field  =  (G,  0,0)  if  the  following  rules  (a)  through  (j)  hold. 

(a)  For  each  x  E  V  and  y  E  V,  then  xAy  E  V. 

(b)  For  each  x  E  V  and  a  E  then  a*  x  E  V. 

(c)  xAy  =  yAx  for  all  x,y  E  V. 

(d)  {xAy)Az  =  xA{yAz)  for  all  x,y,z  E  V. 

(e)  There  is  an  identity  element  E  V  such  that  A  x  =  z  for  all  x  E  V. 

(f)  There  is  an  inverse  element  y  €  V  for  each  x  E  V  such  that  xAy  =  e^. 

(g)  For  all  x,y  E  V  and  a  €  .F,  we  have  the  distributive  law  that 

a*{xAy)  =  {a*  x)A{a-ky)  E  V 

(h)  For  all  x,y  E  V  and  a  €  .F,  we  have  the  distributive  law  that 


(an6)  ★  X  =  (a  *  x)A{b*  x)  E  V 
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(i)  For  all  X  G  and  a,b  ^  we  have  the  associative  law  that 

a*(b*x)  ~  {ao  b)*  X 

Note  the  operators!  There  is  a  change. 

(j)  There  is  an  identity  element  Co  where  Co  *  x  =  x  for  all  x  €  V  and  Co  is 
the  identity  in  GI{eo)  under  o. 

We  denote  the  vector  space  by 

or 

V  =  (K,A,*,C7,D,o) 

J.3  Complex  Vector  Space 

It  is  occasionally  claimed  that  a  vector  space  in  C"  is  merely  a  vector  space  over 
R^".  It  is  not  quite  that  simple.  For  example,  consider  Broida  and  Williamson’s 
problem  2.2.4  [47].  The  vectors  «  =  (1  +  i,2i)  and  u  =  (1,1  +  z)  in  are 
linearly  dependent  over  C,  but  linearly  independent  over  R.  In  C, 

zu  +  (1  —  i)v  =  0 

Consider  the  elements  u,  u  €  F  as  in  a  vector  space  over  R.  This  means 
that  the  scalars  come  from  R.  In  this  case,  there  are  no  scalars  a,  6  G  R  =  .F 
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such  that  au  +  bv  =  0.  Therefore  (V,A,*)  =  (C*,+,-)  is  not  a  vector  space 


over  the  field  of  real  numbers  ^  =  (R,  +,  •). 

Consider  representing  an  element  in  V  given  by  (a  +  ib,c  +  id)  in  C*  as 
{a,b,c,d)  in  R"*.  Then  u  and  v  are  expressed  as  (1,0, 1,2)  =  u  and  (1, 1,0, 1)  = 
V.  These  are  linearly  independent  whether  the  field  T  is  taken  to  be  R  or 
C.  What  is  missing  is  the  accounting  for  the  structure  of  C  under  complex 
multiplication.  There  is  a  way  of  representing  V  where  V  =  C"  and  ^  =  C, 
with  V  €  R^"^^  and  The  complex  number  x  +  iy  £  C  \s  isomorphic 


to  the  matrix 


1  €  R2^2 


\y  «  / 


Using  this  form,  matrix  addition  and  matrix  multiplication  yield  the  same 
answers  as  complex  addition  and  complex  multiplication  in  the  scalar  case.  In 
our  vector  space,  care  must  be  taken  in  defining  a*x  where  a  G  and 

xeV  C  For 


Zi  = 


^  Xi  -Y,  ^ 


Y,  Xi 
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we  define 


where  O  is  matrix  multiplication  of  matrices  in  Then  from  our  example, 


724 


and 


/  > 

(  \ 

0  -1 

1  0 

0  1 

1  0 

0  1 

★ 

= 

-1  0^ 

-1  -1 

1  -1 

We  are  now  back  to  a  structure  where  u  and  v  are  linearly  dependent. 


J.3.1  Verifying  if  a  Field  is  Defined 


It  might  be  worthwhile  just  to  verify  that  we  really  do  have  a  field  and  a  vector 
space  under  this  structure.  Let 

/ 

a  = 

\ 


b  = 


Ol 

-02 

02 

Ol 

6i 

—  62 

h 

61 

and 


/  \ 
Cl  -C2 


C  = 


V 


/ 


C2  Cl 

Define  □  to  be  matrix  addition  and  o  to  be  matrix  multiplication.  Let  G  be 
the  set  of  all  2  x  2  matrices  of  the  form 


f  a:  -y  ^ 


\y  ^  } 

where  a;,  y  €  R.  Then  we  examine  each  of  the  properties. 


which  is  the  zero  matrix. 

4a.  The  inverse  element  of 


5a.  G  is  abelian  under  □,  which  is  inherited  from  addition  of  real  numbers 


in  (R, +). 
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We  also  need  to  verify  that  (G/ieo},©)  is  an  abelian  group.  Let  G 
G/{ea}. 
lb. 

I  \(  h  h\ 

aob  = 

02  /  I  62  , 


dibi  —  02^2  — H"  O162] 

=  eG 

^  CL^bi  +  0'\b2  ^lb\  —  02^2  ^ 

Thus  a  o  6  is  well  defined  in  G. 

2b. 

0161—0262  —[0261+0162]  Cl  — C2 

(o  0  6)  0  c  = 

O261  +  O162  O161  —  O262  /  I  C2  Cl 


(0161  —  0262)01  —  (0261  +  0162)02  — [(0261  +  0162)01  +  (ai6i  —  0262)02] 

(0261  +  0162)01  +  (0161  —  0262)02  (0161  —  0262)01  —  (0261  +  0162)02 

oi(6iCi  —  62C2)  —  02(6201  +  61C2)  —[01(6201  +  61C2)  +  02(6101  —  62C2)] 

01(6201  +  61C2)  +  02(6101  —  62C2)  oi(6iCi  -  62C2)  -  02(6201  +  61C2) 

/  \  /  \ 

Oi  — O2  61  Cl  —  62C2  —  [62C1  +  61 C2] 

02  Oi  /  I  62C1  +  61C2  61C1-62C2 


Oi  — O2 
O2  Oi 


61  -62  1  I  Cl  -02 


62  61 


C2  0, 


=  o  o  (6  o  c) 


Therefore  G  is  associative  under  0. 


3b.  The  identity  element  is 


bidi  —  62^2  — [^2^1  d"  ^1^2] 
62O1  +  bia2  bjOi  —  62O2 


ai(^i  +  Cl)  ~  a2(^>2  +  C2)  — (ai(&2  +  C2)  +  02(^1  +  ci)] 
oi(^2  +  C2)  +  02(^1  +  Cl)  <ii(bi  +  Cl)  —  02(62  +  C2) 
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Olbi  —  02^2  "f"  fllCj  —  02^2  ~[0’lb2  +  02^1  +  +  <*2^1] 


Y  0162  "I"  ^2^1  "I"  ^1^2  "t"  <^2^1  ~  02^2  "I"  0,iC\  02^2  J 


/ 

/ 

.  \ 

1  016] 

1  —  02^2  “[^1^2  +  0261] 

fllCi  -  O2C2 

-[01C2  +  02C1] 

+ 

y  01^2  *t*  <^2^1  “  ^2^2  J 

^  01^2  +  ^2^1 

(l\C\  —  ^2^2  j 

/  \ 

/  N 

/  \ 

(  \ 

Oj  — 02 

bi  —62 

Oi  — 02 

Cl  -C2 

= 

+ 

^02  Oi  y 

*2  y 

O2  Ol  J 

^C2  Cl  ^ 

=  {ao  6)D(a  o  c) 


Thus  the  first  distributive  law  in  T  is  satisfied. 


/  \ 

(  \ 

bi  -f  Cl  —(62  +  C2] 

O]  —02 

(6Dc)  0  0  = 

^  b2  +  C2  61  +  Cl  J 

^02  y 

/  \ 
(61  +  Ci)ai  —  (62  +  C2)a2  “[(^1  +  Cl  )a2  +  (f>2  +  ^2)01] 

^  (61  +  Ci)a2  +  (^2  +  ^2)01  (61  +  Ci)ai  —  (62  +  €2)02  ^ 

i 

b\ai  —  6202  +  CiOj  —  0202  ~[ftl02  +  ^2^1  +  Cia2  —  0201] 


y  b\a2  +  62^1  +  ^102  —  C20.1  b\a\  —  6202  +  CiOi  —  0202  y 


/ 

\ 

( 

\ 

61  Oi  —  62O2 

—  [61O2  +  62O1] 

+ 

CiOi  —  C2O2 

—  [Ci02  +  C2O1] 

^  61O2  +  62^1 

6]Oi  —  62^2  ! 

^  C1O2  +  0201 

CiOi  —  C2O2  ^ 

/ 

\ 

/ 

\ 

( 

\ 

^1 

—  62 

Ol 

-O2 

Cl 

-C2 

Ol 

-O2 

= 

+ 

^2 

Ol  ^ 

1,C2 

> 

Ol 

/ 

=  (bo  a)D(c o  a) 
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Thus  the  second  distributive  law  in  T  is  satisfied. 

Therefore,  (G,n,o)  is  a  field.  If  we  let  an  element  of  V  be 


(  z.  '1 


where 


(  Xi  -yi  \ 


\  Vi  Xi  j 


define  the  operator  A  to  be  matrix  addition. 

(a)  Then  for  every  x,y  £  V  we  know  xAyE  V. 

(b)  From  examining  a  o  6,  we  know  that  the  product  of  two  matrices  of  the 
form  of  Zi  will  again  have  that  form.  Thus 


a*  X  = 


a  o 


a  o  Zn 


is  in  the  same  form  as 

(  Z.  ] 


\z„) 

Therefore  a*  x  E  V. 


(c)  Because  matrix  addition  is  abelian,  we  know  x  Ay  =  if  A x  for  all 


x,y  ev. 
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Thus  the  first  distributive  law  in  V  is  satisfied. 

(h) 


(anft)  *  X  =  (aD6)  * 


V  J 


(aDft)  o  Z\ 


(a  o  Zi)0(bo  Zi) 


iaOb)oZr.  )  (aoZ„)D(6oZ„)  ) 

by  the  second  distributive  law  in  This,  in  turn,  equals 


=  (a  *x)A(b*  x) 


Thus  the  second  distributive  law  in  V  is  satisfied. 

(i) 

/  \ 

boZ, 


/  ^ 

/  \ 

ao  Z\ 

bo  Zi 

, 

A 

■ 

^(10  Zn  ^ 

a*  {b-k  x)  =  a* 


ao  (bo  Zi) 


ao{boZ„)  j 


^  6  o  j 


(a  o  6)  o  Z\ 


^  (a  o  6)  o  Z„  j 


by  associativity  of  o  in  T ,  which  gives  us  (ao  b)*  x.  Thus  the  associative  law 


in  V  is  satisfied. 
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(j)  The  identity  element 


Co 


1 

0 


0 

1 


\ 

/ 


in  G  under  o  satisfies  the  scalar  identity  in  V  : 


(  \ 

/  \ 

Co  0 

^  Co  O  y 

Conclude  that  we  successfully  defined  a  linear  vector  space  (K,  A,*, =  V. 
We  have  shown  that  the  vector  space 


(C", 


is  isomorphic  to 


where  the  operators  +  and  •  are  context  sensitive.  One  valid  observation  is 
that  developing  complex  vector  theory  using  only  real  numbers  is  really  more 
complex  than  sticking  with  complex  numbers.  We  have  shown  that  C",  as  a 
vector  space^  is  not  isomorphic  to  R*".  This  omits  a  lot  of  structure  present  in 
C”.  As  a  curiosity,  we  also  showed  that  under  proper  conditions,  special  kinds 
of  matrices  can  be  vectors  and  scalars. 
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J.4  Construction  of  Vector  Space  in  Iso¬ 
morphic  to  C” 


I  want  to  construct  a  vector  space  in  R^"  that  is  isomorphic  to  a  vector  space  in 
C"  where  the  vector  space  C”  over  the  field  C  has  the  usual  complex  addition 
and  multiplication  operators. 

Let  elements  of  Gn  be  in  R^,  such  as 

/  \ 

a 

^  S 

which  correspond  to  elements  of  Gc  in  C  such  as  a  +  ib.  Then  the  addition 
operator  □/?  defined  by 


(a\ 


c) 


vv 


yd/ 


/ 


\ 

a  +  c 
b  d  ^ 


where  +  is  ordinary  addition  corresponds  to  Dp  defined  by 


(a  4-  ib)Uc{c  +  id)  =  {a  +  c)  +  i{b  +  d) 
The  multiplication  operator  o/j  defined  by 


(  \ 

(  \ 

f  \ 

a 

C 

a • c  —  b- d 

= 

J 

^  b  -  c  +  a  ■  d  ^ 

where  +  and  —  are  ordinary  addition  and  subtraction,  and  •  is  ordinary  mul¬ 
tiplication,  which  corresponds  to  oc  defined  by 


(a  -|-  ib)  Oc  (c  -t-  id)  =  (ac  —  bd)  -|-  i{bc  -f  ad) 
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Let  Vr  C  R^”  have  typical  elements  like 


to  correspond  to  Vc  C  C"  with  elements  that  look  like 


Define  operator  Ar  to  be  ordinary  real  element-wise  addition  between  elements 
of  Vr,  and  Ac  to  be  ordinary  complex  element-wise  addition  between  elements 
of  Vc-  Finally,  define  the  operator  *r  between  elements  of  Vr  and  Gr  to  be 


where  •  are  the  usual  real  arithmetic  operators,  which  corresponds  to  the 
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operator  *0  defined  by 


(x  +  iy)  *c 


/  .  \ 

CEi  -|-  261 

\ 

{x  •  ai  -  y  bi)  +  i{x  ■  bi  +  y  ■  ai) 


With  these  definitions,  then  vector  space  Vr  is  isomorphic  to  Vc-  In  this  sense, 
R2n  ~  C". 


J.5  A  More  General  Vector  Space 

In  this  part,  we  define  a  vector  space  in  a  way  that  buys  us  more  freedom  that 
any  of  the  previous  definitions.  Vector  spaces  usually  require  that  the  vector 
be  constructed  from  elements  of  the  field  of  scalars,  such  that  J-  <1  S  and 
V  C  5".  For  example,  our  experience  is  most  frequently  with  the  structure 
X  T''.  However,  this  is  not  a  necessary  restriction. 

Let  •)  be  a  field  with  additive  identity  0  and  multiplicative  identity  1. 
Let  (V,  M)  be  an  abelian  group  with  identity  0.  A  vector  space  is  the  Cartesian 
product  T  xV  with  elements  x  =  (a,  u)  and  operators  ©  and  ©,  denoted  by 

v  =  (:Fx  v,+,-,M,®,0) 

or  by 

V  =  {Tx  V,®,©) 


with  the  following  properties. 
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(1) 0:(^xy 

(2)  0  ;  X  V)  ^  X  K 

(3)  X  ©  j/  =  j/  0  X  for  all  X,  y  €  ^  X  K 

(4)  (x  0  y)  0  2:  =  X  0  (j/  0  2)  for  all  x,  y,  2  €  ^  x  K 

(5)  [Key  point]  Let  0  =  {(0,u),(a,  0)}  for  all  u  6  V'  and  a  ^T.  Then 

(a)  0  O  X  =  a  O  e  for  all  X  €  ^  X  V,  a  €  ^,  e  6  0. 

(b)  e  0  X  =  X  for  all  X  €  ^  X  V,  e  €  0. 

(6)  [Key  point]  Let  —X  =  {(— a,  u),  (a,  ©u)}  where  —a  is  the  inverse  of 
a  in  (^,+,-)  under  +,  and  ©u  is  the  inverse  of  v  in  (V,N).  Then  for  each 
X  =  (a,  u)  €  ^  X  y  there  exist  —X  such  that  x  0  (— x)  6  0  for  all  — x  6  —X 
associated  with  x. 

(7)  a  O  (x  0  y)  =  (a  O  x)  0  (a  ©  y)  for  all  a  G  ^  and  for  all  x,  y  €  ^  x  V. 

(8)  (a  +  6)  0  X  =  (a  O  x)  0  (6  0  x)  for  all  a,b  E  ^  and  for  all  x  G  ^  x  K 

(9)  1  0  X  =  X  for  all  X  G  ^  X  V. 

(10)  (a  •  6)  0  X  =  a  0  (6 0  x)  for  all  a,b  £  ^  and  for  all  x  G  ^  x  K 


Appendix  K 
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COMPLEX  MATRIX  ALGEBRA 

The  results  contained  here  are  meci  anical  in  nature.  Proofs  of  the  partitioned 
matrix  determinant  are  interesting  enough  to  go  though  once,  but  not  to  com¬ 
mit  to  memory.  Other  theorem  statements  should  be  read,  but  the  proofs  are 
mundane  and  time  is  better  spent  on  other  material.  These  proofs  are  included 
because  I  had  to  do  them  to  extend  results  from  the  real  case  to  the  complex 
case. 

K.l  Basic  Definitions 

Definition  64  Matrix  A  is  Hermitian  if  it  equals  the  transpose  of  its  complex 
conjugate,  A^,  which  is  called  the  Hermitian  transpose. 

Definition  65  Matrix  A  is  called  positive  semidefinite  or  nonnegative  definite 
if  x^ Ax  >  0  for  all  x. 

Definition  66  Matrix  A  is  called  positive  definite  if  x^  Ax  >  0  for  all  nonzero 

X. 

Definition  67  Matrix  A  is  called  negative  semidefinite  or  nonpositive  definite 
if  x^ Ax  <  0  for  all  x. 
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Definition  68  Matrix  A  is  called  negative  definite  if  Ax  <  0  for  all  nonzero 

X. 

Definition  69  Matrix  A  is  called  definite  if  it  is  either  positive  definite  or 
negative  definite. 

Definition  70  Matrix  A  is  called  indefinite  if  x^ Ax  >  0  for  some  x,  and 
x^ Ax  <  0  for  some  other  x.  It  is  also  possible  for  x^ Ax  =  0  for  some  x. 

Definition  71  Let  A  €  C”^”.  Then  A  is  called  orthogonal  if  AA^  =  A^  A  = 
D  is  a  diagonal  matrix,  not  necessarily  the  identity  matrix. 

Definition  72  Let  A  €  Then  A  is  called  complex  orthogonal  if  A  A^  = 

A^ A  =  D  is  a  diagonal  matrix,  not  necessarily  the  identity  matrix. 

Definition  73  Let  A  €  Then  A  is  called  orthonormal  if  AA^  = 

A^A  =  h. 

Definition  74  Let  A  €  C"^".  Then  A  is  unitary  if  AA^  =  A^ A  =  and 
thus  A~^  =  A^ . 

Most  authors  call  such  a  matrix  orthonormal,  using  the  same  terminology 
they  use  for  real  matrices.  Many  authors  refer  to  this  as  orthogonal.  Thus  a 
complex  square  matrix  with  orthonormal  columns  is  called  a  unitary  matrix. 
(Stewart,  p.  259.)  [259]. 
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Example  9  From  Nomizu  (p.  250)  [193],  the  following  example  illustrates 
that  orthogonal  and  unitary  are  not  the  same  concepts.  Let 

i  y/2 

Y  = 

y/2  -i 

Then  Y^Y  =  /,  but 

3  -i2y/2 

Y^Y  = 

i2y/2  3 

Thus  Y  is  complex  orthogonal,  but  Y  is  not  unitary. 

e^  0 

X  = 

0  e‘" 

for  u)  ^  mTT  is  unitary  but  not  complex  orthogonal.  However,  an  orthogonal 
real  matrix  is  unitary. 

Definition  75  Let  A  €  C"*’'".  Then  A  is  subunitary  if  AA^  =  Im  and 
m  <  n,  or  if  A^A  =  /„  and  m  >  n.  Here  A~^  is  undefined. 

Definition  76  Matrix  A  is  called  skew-symmetric  if  A  =  ~A^. 

The  covariance  matrix  of  the  complex  normal  distribution,  when  expressed 

in  R2nx2n  Jg 

skew- Symmetric.  When  expressed  in  the  covariance  matrix 

of  the  complex  normal  distribution  is  Hermitian. 


Definition  77  Matrix  A  is  called  skew-Hermitian  if  A  =  —A^. 
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Properties.  The  elements  on  the  main  diagonal  are  either  pure  imaginary 
or  zero.  The  imaginary  part  of  A  is  symmetric.  Im(y4)  =  [Im(A)]^.  As  for 
the  real  part,  Re(A)  =  — [Re(A)]^.  The  diagonal  of  Re(A)  is  zero.  A  is  skew- 
Hermitian  if  —iA  is  Hermitian. 

Suppose  A  =  —A^.  Consider  BAB^.  Take  its  Hermitian  transpose. 
{BAB^f  =  BA^B”  =  -BAB^ 


Thus,  BAB^  is  also  skew-Hermitian. 


Example  10  Let 


$  = 


^  a  +  ib  e-\-  if  ^ 


g  +  ih  c  +  id 


Then 


= 


—a  +  ib  —g  +  ih 


y  — e  H-  if  —c  +  id  j 

For  $  =  — then  we  must  have  a  =  —a  =  0,  c  =  — c  =  0,  fir  =  — e,  and 


f  =  h.  Thus  we  have 


$  = 


lb  e  ij 


y  —e  +  if  id  j 

Proposition  48  If  A  is  positive  semidefinite,  then  A  is  Hermitian. 


Proof.  A  is  positive  semidefinite  (or,  A  is  nonnegative  definite)  means  that 
Ax  >  0  for  all  x.  Since  x^ Ax  >  0,  it  must  be  real.  So  [x^ Ax)^  =  x^ Ax. 
This  implies 


x^  A^x  —  x^  Ax  =  x^{A^  —  A)x  =  0 
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for  all  X.  Thus  A  —  .  □ 


K.2  Trace  Identities 


The  various  trace  identities  that  have  been  worked  out  are  ones  that  have  been 
required  one  or  more  times  in  subsequent  work.  This  is  particularly  true  for 
those  used  in  the  evaluation  of  moments  from  characteristic  functions. 


Lemma  25  Let  W^A  ^  where  W  and  A  have  no  required  structure. 

Then 


Proof.  Let  B  =  CAT  where  A,B,C,T  G  C"^".  Partition  C  and  T  as 
T  =  {Ti,T2,  -  •  •  ,Tn),  and 


Then 


CMT,  CMT2  •••  CMTn 
CMTi  CMTj 


C"AT,  C^AT2  •••  C^ATn 


Consider  element  Bim,  computed  by 


Bi^  =  C'ATr,  =  iC'\---,Cn 


An  •  •  •  Ain 


V 


/  « 

iz  AljTjm 

j=l 


.  iz  AnjTjm 

\  j=l 


•'djii  ■ '  ■  Afi 
\ 


/  \ 


•  Im 


y  nm  j 


=  '£Y.<^‘‘A,iTi 

l=l  j=l 


jm 


Now  let  C  =  T  =  ly.  Then 


/=i ]=i 


jm 


□ 


Lemma  26  Let  B^xn  o-nd  Znxm  be  complex  matrices.  Then 


tT{BZ)  =  Y.Y.B.a.i 

j  =  l  K=1 


Proof. 


tr(5Z)  =  tr 


Bn  Bi2  ■  •  •  Bln 
B2I  B22  •  •  •  B2n 


Bjni  Bm2  '  '  ■  Bmn  J  ^  Znl  Zn2  '  ‘  '  Znm 


Zn  Z12  ■ 

Z21  Z22  • 


Zlm 

Zi2m 


} 


=  ^  BikZkl  +  ^  B2kZk2  +  ■  •  ■  +  ^  BmkZkm  — 


kj 
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Lemma  27  Let  Bmxn  o,nd  Zmxn  be  complex  matrices.  Then 

m  n 

=  tr{ZB^)  =  triZ^B)  =  tr(BZ^) 

i=i  it=i 


Proof. 


tr(5^Z)  =  tr 


V 


Bn  B21  ■  •  •  Bm\ 
B\2  B22  ■  ■  •  Bm2 

Bln  B2n  •  •  •  Bn 


) 


Zn  ■2^12  •  •  •  Z\n 


Z21  Z22 


Zml  Zjn2 


''2n 


=  51  BjiZji  +  5^  Bj2Zj2  H - 1-  5Z  BjnZjn 

j=1  i=l  j=l 

n  m  m  n 

=  E  E  =  E  E  B,a,k 

^=1  i=l  j=l  jb=l 

n  n  n 

=  5Z  ^ikZlk  +  5Z  ^2fcZ2fc  H - +  Xrf  ^rnkZimk 


=  tr 


Zi 

Z2 


fc=l  fc=l 

k=l 

(  \ 

Z12  •  •  •  Zln 

Bn  B21  •  •  •  Bmi 

Z22  •  •  •  Z2n 

B\2  B22  •  •  •  Bm2 

Zr7i2  '  ■  ■  Zmn  y 

^  Bln  B2n  ■  ■  ■  Bfnn  ^ 

=  tr(Z5^) 


y  -^rr 

Because  iv  A  =  tr  we  also  have 

tr(5^Z)  =  tr(ZF^)  =  iviZ'^B)  =  tr(fiZ^) 

□ 

Lemma  28  Let  B,T  £  C”^”*  be  complex  matrices.  Then 

n  m 

tr(Br^)  =  5^  x; 

/=i  /t=i 
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Proof. 


where  C  =  BT^ .  Element  {i,j)  of  tr{BT^)  is 


Qi  =  '£BnT,, 

k=l 


Then 

iT{BT^)=j2Cii^j2Bik-i)k 

/=i  /=i 

□ 


Lemma  29  Let  Bnxm  and  Znxm  he  complex  matricto.  Then 


m  n 

tv(B"Z)  =  Y,Y.Bl,Zk, 

j=l  fc=l 


Proof. 


=  Y.  BUZ,,  +  ■£  BUZ,,  +  ---  +  EB:„Z,„  =  Y.'E  B-,,Z,, 

k=l  k=l  A:=l  j=l  fc=l 
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Proposition  49  Let  B  =  and  Z  =  ben  xn  complex  matrices.  Then 

tr(BZ)  =  f^B,kZt,  +  Z  E  2Re{Bj,Z,j} 

fc=l  j=l  k=j+l 


Proof. 


tr(BZ)  =  tr 


/  \  / 

Bn  Bi2  •  •  •  Bln 


BU  B. 


12  ^22  •  •  ■  ^2n 


Ly  ^In  ^2n 


Zn  Z21  ■■■  Z*^ 

Z21  Z22  •  •  •  Z*2 


Bnn  j  ^  Znl  Zn2 

n— 1 


/J 


=  E  ^IkZkl  +  S12Z2I  +  E  ^2kZk2  H - 1-  ^  Bl^Z*f.  +  B„nZ„ 

k=l  k=2  fc=l 

=  +  2  E  2ReiBjkZkj) 

k=l  J=1  k=J+l 


□ 


Lemma  30 


tr(Z^)  =  tr(ZZ)  =  '^£Z,,Zi, 

i=i  j=i 


Proof. 


tr(ZZ)  =  tr(Z2)  =  tr 


/  \  / 
Zn  Z12  —  Zin 


Z21  Z22  ■  ■  *  Z' 


2n 


L\ 


Znl  Zn2  •  ■  •  Znn  J  ^  Znl  Zn2 


Zn  Z\2  • 

Z21  Z22  ■ 


•  Zin 

•  Z2n 


1=1 i=i  j=i 1=1 


□ 


Proposition  50 


tr(ZZ^)  =  tr(Z^Z)  =  ZE^iJ 

i=i  j=i 


Proof.  Let 


Z  —  (Zi,  •  •  • ,  Z„)  — 


Zi  are  column  vectors  and  the  Z^  are  row  vectors.  Then 

ZfZ,  •••  ZfZn 


tr(Z^Z)  =  tr 


■ 

(  \ 

“ 

/ 

ZI 

y  V 

; 

(^x  •••  z„j 

=  tr 

. 

V  ”  y 

\ZIZ^  ...  ZJZ„ 


j=l  i=l J=1 


□ 


Proposition  51  Ltt  Z  €  Then 


P  n 


tr(Z''Z)  =  5:^|Z,ii 

t=l  J=1 


Proof. 


tr(Z"Z)  =  tr 


Zii  Zji 

Z#  y* 
12  ^22 


[yz;„  Z2*„  ..  z;,jy 


Zii  Zi2 

Z21  Z22 


•  Zi„ 

•  Z2n 


Z*„  I  \  Zpi  Zp2 


-'pn 


Compute  only  the  elements  ending  up  on  the  diagonal.  tr(Z^Z)  = 


ZjjZii  +  Z21Z21 
H - +  ^pi^pi 


Zi2Zn  +  Z22Z22 
H - h  Z*2Zp2 


Zi\nZ\n  +  Z2nZ2n 

■1 - +  Z’^Zpn 


«=1  j=l  i=l  j=l 

where  ||Z||^  is  the  Frobenius  norm  for  the  matrix  Z. 

Alternate  Proof.  Let  Z  =  (Zi, •  •  • , Z„)  where  Z,  €  C'’.  Then 


- 

/  \ 

( 

z»z. 

•  Z»Zn 

tr(Z"Z)  =  tr 

• 

(z. 

=  tr 

\ 

; 

yN 

\ 

^  Z^Z, 

•••  ZHZr.  ^ 

n  p 


=  E  =  E  E  =  E  E 

«=:1  1=1  J=1  i=l  j  =  l 


Proposition  52  The  two  identities  presented  here  are  ones  that  occur  in  the 
complex  matrix  normal  distribution.  The  derivation  is  so  short  that  it  will  be 
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included  in  the  identity  statement. 


■ 

- 

• 

" 

All 

Ai2  •  ■  • 

Ain 

A* 

^11 

A* 

^21 

•  a;, 

A21 

A22  •  ■  ■ 

A2n 

A* 

^12 

A* 

^22 

■  A*2 

Anl 

An2  ■  •  • 

Ann 

A* 

^In 

A* 

^2n 

■  a:„_ 

E  S  ^lk^2k  ■  ■  ■  £  ^Ik-^nk 

k=l  k=l  k=l 

E  ^2kA{k  E  MkA2k  •  •  •  E  MkA^k 

k=l  k=l  k=l 


E  ^nk’^lk  -E  *  E  ^nk-^nk 

L  )t=l  k=l  k=l 


Therefore 


^ii'  — 

fc=l 


E  = 


^11  ^21 


b: 


3^2  B. 


22 


nl 


^n2 


B\i  B\2  ■  •  •  Bln 

B21  B22  ■  ■  ■  B2n 


Bln  ^2n  ■  ■ 

■  J  [  Bnl 

Bn2  '  ■  ■ 

E  b:,b,i 

j=i 

tB:,B,2  ••• 

3=1 

E  B*iBap 

5=1 

E  B*2Bal 

3=1 

E  B*2B,2 

3  =  \ 

E 

5=1 

E  B:^Bal 
»=1 

hB%Ba2  ••• 

5=1 

E  B,VB„ 

S=1 
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Therefore 


Lemma  31  Let  A  €  C""',  Z  €  C'”'’,  B  6  C*”".  T/ien 


m  p  q 


Proof. 


ix{AZB)  =  EEE  ZjlzBjci 

t=l  J=1  it=l 


tr(AZB)  =  tr  :  Z  {  b,  ■■■  B„j 


where  A*  is  a  row  vector  and  Bj  is  a  column  vector.  Then 

f  A^ZBi  A^ZB2  •••  A^ZBr, 


tr(AZB)  =  tr 


A^ZBi  A^ZBi  •••  A^ZBr, 


A^ZBi  A'^ZBi  •••  A'^ZBr, 


=5:^‘zs,=i:  ...  4, 


Z\\  •••  Z\ 


Zpl  •  •  •  Zpq  j  I  Bqi 


i=l  \j=l  j=l  i=l 


m  p  q 


=  EEE  AijZjkBki 

t=i  j=i  fc=i 


□ 
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where  B  =  [i9i,  ^2,  •  •  • ,  Bp\  and  D  =  [Z)i,  £>2,  •  •  • ,  Dp]. 


Proof. 


ix{AB^CD)  =  tr 


All  Ai2 
A21  A22 


Alp 

A2p 


Api  Ap2  •  *  ■  A; 


pp 


\ 

f  „  \ 

B" 

Bf 

B" 

^  P  / 

C 


(a 


D. 


Dr, 


=  tr 


All  Ai2  •  •  •  Alp 

A21  A22  • '  •  A2p 


^  BfCDi  B»CD2  •••  B^CDp^ 


BffCDi  B»CD2 


B^CDp 


A„„  }  I  BjfCDi  B»CD2  •••  BjfCDp 


Api  Ap2  —  ^pp  j  y  ijp  «_^^i  Mjp  «_/xx2  ijp 


I  do  not  have  to  do  all  the  computations.  I  only  need  the  sum  of  the  diagonal 
elements  of  the  product,  therefore 


p  p 


ix{AB”CD)  =  E  E  AiiBfCD, 

•=i  j=i 


Notice  that  the  order  of  subscripts  reverse.  □ 


Proposition  53  Let  A,F,C,D  all  be  p  x  p  (complex)  matrices.  Then 

tx{AFCD)  =  tt  AijF^CDi 

i=i  >=i 
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where 


and  D  =  [Z)i,  Z)2,  •  •  • ,  Dp]. 


Proof.  This  is  merely  (but  a  useful)  corollary  to  lemma  33.  □ 


Proposition  54  Let  A,B,C,D  all  bepxp  (complex)  matrices.  Then 


«=i  i=i 


Proof. 


All  Ai2 


tr(A^BCD)  =  tr 


A21  A22 


Api  Ap2 


^pp  I  \  ^p 


where 
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and  D 


=  (ft 


Dp  )  •  Then 


A*  . . . 

/ill  ^v\ 


BiCDi  BiCD-i  •••  BxCDp 


tr(A"B(7D)  =  tr 


A*  A'*  ...  /I* 

/ll2  -^22  *^p2 


B2CA  A<^A  •••  BiCDp 


A*  A*  . . .  4^* 

^Ip  '^2p  ^pp 


BpCA  BpCA  •••  BpCDp 


The  sum  of  the  diagonal  elements  is  all  that  is  needed. 


p  p  p  p 


tx{A«BCD)  =  Y.Y:  A-iB,CDt  =  ZEET. 

i=l  j=l  i=l  j=l  it=l  1=1 


K.3  Inverse 


K.3.1  Partitioned  Matrix  Inverse 


Lemma  34  Partitioned  Matrix  Right  Inverse.  Let  Z  €  M„(C)  be  partitioned 


A  C 


Let  ZY  =  In  where 


B  D 


R  T 


S  U 


y  is  the  Right  Inverse  of  Z.  Then 


(A-CD-^B)-^  -A-^C{D-  BA-^C)-^ 

-D-^B{A-CD-^B)-^  {D-BA-^C)-^ 


Z"*  =  Y  = 
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h  .  -A-^C 

=  {A-CD~^BY"  {D-BA-^cy^ 

.1-^"^/  7,  ] 

A  and  D  must  be  square  matrices.  This  is  Graybill  theorem  8.2.1  [95]. 


Proof.  Although  I  did  the  following  proof,  it  is  a  common  and  easy  proof 


that  must  have  been  done  by  many  people. 


ZY  = 


A  C  ]  R  T 


B  D  \  S  U 


AR  +  CS  AT  +  CU 


BR  +  DS  BT  +  DU 


This  implies 


/i  O2 


0i  h 


DS  =  -BR=^  S  =  -D-^BR 


AT=-CU  =^T  =  -A-^CU 


Substituting  into  the  main  block  diagonal  terms. 


AR  +  C{-D-^BR)  =  {A-  CD-^B)R  =  li^R  =  {A-  CD-^B)' 


B{-A-^CU)  +  £)[/  =  (£)-  BA-^C)U  =  h  ^  U  =  {D  -  BA-^C)' 


Substituting  back  into  F,  we  obtain  Z  ^  =  Y  = 


h  -A-'C 

{A-CD-^B)-^  {D~BA-^C)-^ 

-D-^B  \  h  , 
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Lemma  35  Partitioned  Matrix  Left  Inverse.  Let  Z  G  Mn{C)  be  partitioned 


A  C 


B  D 


Let  ZY  =  In  where 


R  T 


S  U 


Y  is  the  Left  Inverse  of  Z.  Then 


=  y  = 


iA-CD-^B)-^{Iu-CD-^) 

{D  -  BA-^C)-^{-BA-\h) 


A  and  D  must  be  square  matrices.  Although  I  did  this  theorem  and  its  proof, 
it  is  so  basic  that  it  must  have  been  done  before. 


Proof. 


R  T  \  {  A  C 


S  U  \  B  D 


RA  +  TB  RC  +  TD 


SA  +  UB  SC  +  UD 


This  implies 


h  O2 


0i  h 


SA  =  -UB  =>S  =  -UBA- 


TD  =  -RC  =  -RCD- 


and 
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Substituting  back  into  the  main  block  diagonal  terms, 

RA  -  RCD-^B  =  R{A  -  CD-^B)  =  R  ^  R  =  {A-  CD-^B)-^ 

and 

-UBA-^C  +  VD  =  U{D  -  BA-^C)  =  I2^U  =  {D-  BA'^C)-^ 

Substituting  back  into  Y,  we  obtain 

(A-CD-^B)-^  -iA-CD-^B)-^CD-^ 

Z-^  =  Y  = 

-{D-  BA-^C)-^BA-^  (D-BA~^C)-^ 

(A-CD-^B)-\!i,-CD-^) 
{D-BA-^C)~^i-BA-\l2) 

□ 

K.3.2  Complex  Matrix  Inversion  Lemmas 

Matrix  inversion  lemmas  are  frequently  encountered  in  applied  time  series 
analysis.  They  are  particularly  useful  when  formulating  Kalman  filter  algo¬ 
rithms.  They  are  included  here  for  the  sake  of  completeness  within  the  subject 
of  complex  matrix  theory  for  acoustic  signal  processing. 

Lemma  36  Let  A  and  B  be  n  x  n  nonsingular  complex  matrices.  Let  K  be 
an  m  X  m  nonsingular  complex  matrix.  Let  X  be  an  m  x  n  complex  matrix. 
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Then  the  two  following  expressions  imply  each  other.  If  one  is  true,  then  the 
other  is  true  provided  the  necessary  matrix  inverses  exist. 

A-^  =  +  X”Y-^X  ^A=B-  BX^iXBX”  +  Y)-^XB 

Proof.  This  proof  follows  Sinha  and  Kuszta’s  [246]  proof  for  the  case  of 
real  variables.  Let 

1  Let  A-^  =  5-1  + 

2  1=>  A-i -5-1  = 

3  1  =>  AA-^BX»  =  A(5-i  +  X»Y-'^X)BX» 

4  3=>  BX^  =  AX« +  AX^Y-^XBX^ 

5  4=»  BX^  =  AX^Y-\Y  AXBXff) 

6  5=^  BX^{Y  -vXBXf^Y^XB 

=  AX"Y-^(Y  +  XBX»){Y  +  XBX^Y'^XB  =  AX^Y'^XB 

7  6=»  B  -  BX^jY  +  XBXf^Y^XB  =  B-A  X^Y'^X,  B 

=  B-  A{A-^  -  B-^)B  =  B-  B  +  A  =  A 

8  A  =  B  -  BX»{XBXff  +  Y)-^XB 

a 

Lemma  37  Let  A  and  B  be  n  x  n  nonsingular  complex  matrices.  Let  Y  be 
an  m  X  m  nonsingular  complex  matrix.  Let  X  be  an  m  x  n  complex  matrix. 
Then  the  two  following  expressions  imply  each  other.  If  one  is  true,  then  the 
other  is  true  provided  the  necessary  matrix  inverses  exist. 

^-1  =  5-1  _  x"y-iX  /I  =  5  +  BX^{Y  -  XBX^yKXB 


758 

Proof.  This  proof  follows  Sinha  and  Kuszta’s  [246]  proof  for  the  case  of 
real  variables.  Let 

1  Let  A-^  =  -X^Y-^X 

2  A-^-  =  -X^Y-^X 

B-^-A-^  =X^Y-^X 

3  1  AA-^BX^f  =  A(B-^  -  X»Y-^X)BX» 

4  3=»  BX^  =  AX^  -  AX^Y-^XBX^ 

5  4=>  BX^  =  AX"Y-\Y -XBX^) 

6  5=J^  BX^iY  -XBX^Y^XB 

=  AX"Y-\Y  -  XBX»){Y  -  XBX^y^XB  =  AX^Y'^XB 

7  6=>  BABX^iY-XBX^)-^XB  =  B  +  AX^Y-^X.B 

S  I.I  I  1^,..— ✓ 

B-i-A-i 

=  B  +  A{B-^  -  A-^B  =  B  +  A-B  =  A 

8  A  =  B  +  BX”{Y  -  XBX»)-^XB 


Lemma  38  Let  A  and  B  benxn  nonsingular  complex  matrices.  Let  X  be  an 
nxm  complex  matrix  such  that  {I  —  X^ BX)~^  exists.  Then  the  two  following 
expressions  imply  each  other.  If  one  is  true,  then  the  other  is  true  provided 
the  necessary  matrix  inverses  exist. 


A-^  =  B-‘  -  XX"  <^A  =  B  +  BX{I  -  X^BXy^X^B 
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Proof.  Let  A~^  =  B  ^  —  XX^ .  Post- multiply  by  BX  and  premultiply  by 
A  to  get 

AA-^BX  =  AB-^BX  -  AXX^BX 

This  implies 

BX  =  AX-  AXX^BX  =  AX{I  -  X” BX) 

Post-multiply  by  (/  —  X^BX)~^X^ B  to  get 

BXil  -  X”BX)-^X^B  =  AXX^B 
Add  B  to  both  sides  to  get 

B  -I-  BX{I  -  X^BX)-^X"B  =  B  +  AXX^B 
Rearranging  the  initial  equation,  XX^  =  B~^  —  A~^  implies 
BABX{I-X"  BX)-^X"  B  =  B^A{B-^-A-^)B  =  B+AB-^B-AA~^B  =  A 
□ 

Lemma  39  Let  A  and  B  ben  xn  nonsingular  complex  matrices.  Let  X  be  an 
nxm  complex  matrix  such  that  (/  —  XBX^)~^  exists.  Then  the  two  following 
expressions  imply  each  other.  If  one  is  true,  then  the  other  is  true  provided 
the  necessary  matrix  inverses  exist. 

A-^  =  B-^  +  XX"  <^A  =  B-  BX{I  +  X"BX)-^X"B 
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Proof.  Let  A  ^  =  B  ^  +  XX^ .  Post-multiply  by  BX  and  premultiply  by 
A  to  get 

AA-^BX  =  AB-^BX  +  AXX^BX 

This  implies 

BX  =  AX^-  AXX^BX  =  AX{I  +  X^ BX) 

Post-multiply  by  (/  -f-  X^ BX)~^X^ B  to  get 

BX{I  +  X^BX)-^X^B  =  AXX”B 
Add  B  to  both  sides  to  get 

B  -h  BX{I  A  X^BX)~^X^B  =  B+  AXX^B 

Rearranging  the  initial  equation,  XX^  =  B~^  —  A~^  implies 
B  +  BX{I  ~  X^BX)-^X^B  =  B  +  A{A-^  -  B-^)B  =  B  +  B~A  =  2B-A 
Solve  for  A  to  get  the  final  answer.  □ 
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where  Ci  is  the  standard  basis  vector  of  length  I  consisting  of  all  zeros,  except 
a  1  in  position  i.  This  is  often  used  in  deriving  Kalman  filters. 


Proof.  Let 


w-^  = 


A 


B  D 


R  5" 


5  U 


By  lemma  34,  R  ^  =  A  —  B^D  ^B.  By  lemma  37, 


R  =  A-^  +  A-^B^{D  -  BA-^B^)-^BA 


-1  dH\-1  t)  A-l 


=  (y"y)-^  +  {Y^Y)-^Y^Z{Z"Z  ~  Z^Y[Y^Y)-^Y" Z)-^Z^Y{Y^Y)-^ 


=  (ywy)-i  +  (ywy)-iy"z[z"{/  -  K(y"y)-^y"}z]-*z"y(r"y) 


71-1  7tiV(vHv\-\ 


Since  Z  is  a  column  vector, 


R  -  4.  (y‘'y)-'y‘•^^’'y(y''y)-' 

y  • )  T  zmi -Y(Y»Y)-'YH}Z 


The  {i,iy^  element  of  R  is  e^ Rci,  given  by 
py.  =  e"(y"y)-e  + 

^  z«{/- y(y"y)-iy"}z 


=  e"(y"y)-'e.  + 


e,^(y"y)-^y"z|^ 

z«{/-y(y"y)-iy"}z 


□ 


Lemma  41  Let  >  0  6e  partitioned  in  as 


yHy  yHz 

W=  ={Y,Zf{Y,Z) 

ZHy  z»Z 


where  Y  G  Z  is  a  column  vector.  Then  G  W  ^  is  given  by 


]YPP^ 


ZH{I  -YiY»Y)-W»}Z 


This  is  often  used  in  deriving  Kalman  filters. 


Proof.  Let 


A 


B  D 


w-^  = 


R 


S  U 


By  lemma  34,  f/  ^  =  D  —  BA  ^B^.  Thus, 


U  =  [Z"Z  -  Z^Y{Y”Y)-^Y^Z]-^  =  [Z"(/  -  Y{Y”Y)-^Y^)Z]-^ 


The  element  of  U  is  given  by 


=  efUej  =  ef  [Z"(/  -  Y{Y^Y)-^Y^)Z]-^ej 


z«{/-y(F"y)-iv'"}z 


with  ej  as  the  standard  basis  vector  of  all  zeros,  except  for  a  1  is  position  j.  □ 


K.4  Determinants 
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K.4.1  Basic  Properties 

Definition  78  Let  X  €  have  elements  (X,j).  The  matrix  formed  by 

deleting  rowi  and  column  j  from  X  is  called  the  minor  of  Xij. 

Definition  79  Let  A  €  C"^"  have  elements  (a,j).  Let  X,j  be  the  minor  of  Cij. 
Then 

Aij  =  det{X„) 

is  the  cofactor  of  Oij.  Note  that  the  cofactor  is  a  scalar. 


Proposition  55  Let  A  €  C"’'"  have  elements  (o^)  and  cofactors  (ay).  Then 

n  n 

53  OikCiik  =  0.  Similarly,  53  dkiOiki  —  0.  This  is  Ayres  problem  3.10  [35]. 

k=i  k=i 


Proof.  This  is  essentially  the  proof  given  by  Ayres,  with  statements  made 


slightly  more  explicit.  5Z  ^kiOkj  is  the  determinant  of  some  matrix.  Call  it 

Jk=l 

B.  For  example,  let  X^  €  C".  Then 


Oil 

012  •  •  • 

“ij-l 

Xl 

J+1  •  ■ 

®ln 

n 

det  P  =  X]  ^kockj  def 

fl21 

022  •  •  • 

X2 

02,j+2  ■ 

■  •  U2n 

fc=i 

• 

• 

• 

i  Onl 

Onl  •  •  • 

Onj-1 

Xn 

Qn,j+n 

Substituting  a*,  for  Xk,  we  get 


det  5  =  ^  OfciOfcj 
fc=t 
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=  det 


V 


Oil 

012  • 

■  •  Oii  • 

■  ■ 

On 

OlJ+1  ■ 

•  •  Oin 

021 

022  • 

■  •  02,  • 

■  ■ 

02, 

02,j+l  • 

•  •  02n 

Onl 

On2  •  ' 

■  •  Oni  •  • 

On,j-l 

Oni 

On,i+l  •  ■ 

Onn 

/ 


B  now  has  two  identical  columns,  when  i  ^  j.  We  know  because  of  this 
condition  that  its  determinant  is  zero.  [However,  when  X  is  not  a  linear 

n 

combination  of  the  columns  of  A,  then  det  H  ^  XkQkj  is  not  necessarily 

ife=i 

zero,  but  rather  is  the  determinant  of  a  brand  new  matrix.]  A  similar  proof 
applies  for  the  row  expansion  case.D 


Proposition  56  A(adj  A)  =  det(A)4i  for  A  6  This  is  Ayres  equation 

6.2  [35].  When  exists,  then  A~*  = 


Proof.  This  proof  is  an  expansion  of  Ayres’  proof.  This  was  motivated  by 
wondering  if 


adjA  =  [(-l)‘+^det(A:„)]^ 


adjA  =  [(-l)‘+^det(A:.^)]" 


where  Xij  is  the  minor  of  a,j  for  A  =  (cjj). 
Let 


adj  A  =  det(Xiy))^  =  (a.i)’' 
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where  a,j  is  the  cofactor  of  Then  examine  the  product 


an  ai2  •  ••  ain  (  «ii  oi2\  •  •  "ni 


>l(adj  A)  = 


Oji  022  •  ■  ■  I  I  <^12  0^22  ‘  '  '  ^n2 


^nl  ^n2  '  '  ‘ 


^2n  *  *  * 


£  aikOtlk  iZ  aikOl2k  •  •  •  Z) 

k=l  k=l  k=i 

n  n  n 

E  a2fe«lfc  E  «2*:«2fc  •  •  •  E  0'2kOink 

fc=l  fc=l  fc=l 


i  E  OnkOifc  2  a„fcQ2fc  •  •  •  E  OrifcOnit  ) 

\  k=l  <:=1  *=1  / 

n 

Recall  that  we  proved  earlier  that  ^  OikOjk  =  0  for  i  ^  j.  Thus 

fc=i 


det  A 


>l(adj  A)  = 


det  A 


=  In  det  A 


Further,  when  A  *  exists  then 


det  A 


which  implies 


A  ‘Aadj(A)  =  A  */„det(A) 


,  ^  adj/1 
det^ 


Therefore 


adj>l  =  [(-l)‘+^  det(Xo)r 
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works!  □ 

Proposition  57  [det(yl)]“^  =clet(yl'*). 

Proof.  det(i45)  =  det(yl)  det(5).  Let  B  =  A~^.  Therefore 


det(/lA“^)  =  det(/)  =  1  =  det(/l)  det(.4"^) 
This  implies  [det(A)]“^  =  det(y4“^).  □ 

Lemma  42  det(>l’')  =  (det  A)*. 

Proof.  A*  =  (a*j).  By  definition  of  the  determinant, 
det  (A*)  =  ^  (sgn  ar)a;„y2c2 ' ' '  <<Tn 


where  5„  is  the  set  of  all  permutations  of  the  ordered  set  (1, 2,  •  •  • ,  7^).  Therefore 
<7  takes  on  n!  different  permutations  from  the  form 

(  1  2  S  n  \ 


a  = 


y  <7i  <72  <73 


/ 


Then 


det(A*)  =  j  X)  o’)ai<T, 02(72  •  •  •  1  =  (det  >1)* 


Note  that 


□ 


sgn  <7  =  < 


+1,  if  <7  is  an  even  permutation  of  (1, 2,  •  •  • ,  n) 
—  1,  if  <7  is  an  odd  permutation  of  (1, 2,  •  •  • ,  n) 
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Proposition  58  det(A^)  =  (det  A)*  =  (det  A)^. 

Proof. 

det(A")  =  det[(>l*)^]  =  det(>l*)  =  (det  A)*  =  (det  A)" 

since  det(y4)  is  a  scalar  and  is  thus  equal  to  its  transpose.  □ 

Lemma  43  Let  A  be  a  unitary  complex  n  x  n  matrix.  Then  det(>i)  =  e‘®  for 
some  0  €  R. 

Proof.  If  A  is  unitary,  then  A^  =  A~^.  Thus  A^A  =  7. 

det(i4^A)  =  det(/)  =  1  =  det(A^)det(A)  =  |det  Af  =  1 
Therefore  det  A  =  e'®  for  some  0  €  R.  □ 

Lemma  44  Let  A  be  an  orthonormal  complex  matrix  where  AJ A  =  7.  Then 
det  A  =  ±1. 

Proof.  If  A  is  orthonormal,  then  A^A  =  7  and  hence  A^  =  A“^.  Then 
det(A^A)  =  det  7=1  =  (det  A^)(det  A)  =  (det  A)^  =  1 
Then  det  A  =  ±1.  □ 

Proposition  59  If  A  is  a  skew-Hermitian  matrix,  then  [I  +  A][7  —  A]"*  is 
unitary.  This  comes  from  Littlewood  (p.  19)  [167]. 


Proof.  I  have  supplied  the  following  proof.  Let  =  —A.  Note  that 
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(/  -  A){I  +  A)  =  I- A'^  =  {I  +  A){I  -  A) 

Then 

{{I  +  A)il  -  A)-Y  ■  [(I  +  A)il  -  A)-^] 

=  (/  -  A)-^{I  +  Af{I  +  A){I  -  A)-^ 

=  {I-  A)-"(/  -  A)il  +  >!)(/  -  A)-^ 

=  (/  -  A)-»{I  +  A)il  -  A)(I  -  A)-^ 

(/  -  A)-"(I  +  A)  =  [(/  +  A)"(/  -  A)-']" 

=  ((/-A)(/-A)-^]"  =  /"  =  / 

Therefore  (I  —  A)(/  —  A)“*  is  unitary.  □ 

Proposition  60  IfB  is  unitary  and  —1  is  not  a  characteristic  root  of  B,  then 
there  exists  skew-Hermitian  matrix  A  such  that 

5  =  [/  +  A][/-A]-‘ 

This  comes  from  Littlewood  (p.  19)  [167]. 

Proof.  I  supplied  the  following  proof.  B  is  unitary  implies  BB^  =  /.  Let 
B  =  {I  +  A){I  —  A)~^.  First,  show  that  —1  is  not  reasonable  as  an  eigenvalue 
of  such  a  B. 


det[(/  +  A)(/  -  A)-'  -  A"/]  =  0  =  det{[(/  +  A)  -  -  A)](/  -  A)"'}. 
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Assume  det[(/  —  A)  ^  0.  Then 

det[(/  +  A)  -  -  A)]  =  0 

=  det[(l  +  A2)A  +  (l-A2)/]^0 

Let  A^  =  —1.  Then  det[2/]  =  2*"  /  0  where  rank(/)  =  m.This  contradicts 
det[5  -  A2/]  =  0,  so  A2^-L 

Now,  consider  BB^  =  7,  which  implies  det{BB^)  =  1  or  det  B  =  e*®. 
det[(/  +  A)(7  -  A)-‘{(/  +  A){I  -  A)-'}"]  =  1 

=  det[(/  +  A)(7  -  A)-\I  -  A)-"(/  +  A)"I 

=  [det(/  +  A)][det(/  -  A)]-‘[det(/  -  A)"]-'[det(/  +  A)"] 

This  implies 

[det(/  +  A)][det(/  +  A)"l  =  [det(/  -  A)][det(/  -  A)"] 
which  in  turn  implies 

[det(7  +  A){I  +  A)"]  =  [det(/  -  A)(7  -  A)"] 

=  det(/  +  A  +  A"  +  AA")  =  det(7  -  A  -  A"  -f  AA") 

This  is  true  when  A^  =  —  A.  □ 
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K.4.2  Partitioned  Matrix  Determinants 

Lemma  45  Let  B  be  a  complex  square  matrix  partitioned  as 

Bn  B\2 

B21  B22 

and  let  B^^  exist.  Then 

det(fi)  =  det(5ii )  det(522  — 

This  is  a  complexification  of  Graybill  (p.  184)  theorem  8.2. 1(3)  [95]. 

Proof.  I  provided  the  following  derivation  based  on  Graybill’s  derivation 
of  lemma  46.  Since  1  =  det(.0ii)det(J9f,*),  we  can  write 

det(.6)  =  det(Bfj*)det(.B)det(5n) 


Thus 
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det(J5ji^)  det(5)  =  det 


5,,^  0  jBii  B\2 


—  I  B21  B22 


=  det(/l)  det(jB) 


Since  A  and  B  are  conformable,  det(/4)  det(B)  =  det(A5).  Therefore 


det(Bjj^)  det(B)  =  det 


B^^^  0  5i,  5, 


—B21B11  I  I  \  B21  B22 


=  det 


—  ^21  +  J521  B22  —  B2\B^^  B\2 


Finally, 


11  ^12 


0  B22  —  B21B11  Bi2 


—  det(B22  ~  B21B1I Bu) 


det{B)  =  det(jl9j/)  det(5)  det(Bii)  =  det(Bii)  det(522  —  B2iBi^Bi2) 


Since  determinants  are  polynomials,  they  commute.  □ 


Lemma  46  Let  B  be  a  complex  square  matrix  partitioned  as 


B\\  B\2 


B21  B2 


and  let  Boo  exist.  Then 


det(B)  =  del  I det(Bii  —  Bi2B.^2  ^21) 


This  is  a  complexification  of  Graybill  (p.  184)  theorem  8.2. 1(2)  [95]. 
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Proof.  This  derivation  is  taken  from  Graybill. 

det(jB)  =  det(B22)det(B)det(fi^2^) 

Then 

B\\  B\2  I  0 

det(5)  det(J922^)  =  det  det 

B21  B22  ~B22  B21  B22 


■  1 

to 

B\2 

det 

[521 

B22 

=  det 


Bn  Bi2 
B21  B22 


I  0 
—B22B21  B22 


=  det 

Therefore 


Bn  —  B12B22  B21  B12B22 

=  det 

B21  —  B22B22  B21  B22B22 
Bn  —  B12B22  B21  B12B22 

=  det(5n 

0  I 


=  det(5n  —  B12B22  B21) 


det(B)  =  det(J522)det(5ii  —  B12B22B21) 


Proposition  61  Let  A,  B,  and  I  be  n  x  n  matrices.  Then 

(a  i] 

det  =  det(y4  —  B) 

') 

Proof.  This  is  a  simple  application  of  lemma  46. 

(a  i] 

det  =  det(/)  det(/l  -  ir^B)  =  det(A  -  B) 

B  I 
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Proposition  62  Let  A,  B,  and  I  be  n  x  n  matrices.  Then 

det  I  I  =  det(74  —  B) 


Proof.  By  lemma  46, 


det 


‘a 


I  I 


=  det(/)  det(>l  -  BI'^I)  =  det(A  -  B) 


Proposition  63  Let  A  and  B  be  square  matrices,  not  necessarily  the 
size.  Then 


det 


^  A  0^ 


0  B 


=  det(A)  det(i5) 


Proof.  By  lemma  46, 


^  o' 


det 


0  B 


—  det(/l)  det(B  —  OA  *0)  =  det(A)  det(F) 


□ 


K.4.3  Other  Determinants 

Lemma  47  Let  A  €  and  B  €  C”*’'".  Then 

det(/„  +  AB)  —  det(/m  +  BA) 


This  is  Eaton’s  proposition  1.35  (p.  43)  [74]. 


same 
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Lemma  48  Let  a,  e  €  C  and  let  matrices  B,C,D  be  such  that  BCD  G 
and  CDB  G  Then 

det  i^^BCD  -  elt^  =  (-1)”"''^  det  {^JODB  -  al^ 

This  is  a  corollary  to  Eaton’s  lemma  1.35  [74],  supplied  by  me. 


Proof.  This  is  a  simple  application  of  lemma  45  and  lemma  46. 

/  \ 

ah  CD  1  . 

det  =  det(a/„)  det(e/t  —  5-/„  'CD) 

^  B  elk  ^ 

=  det(e/fc)det(a/„  -  CD-Ik'B) 

e 
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This  implies 

a"det(e/it  - -BCD)  =  e*'det(a/„  -  -CDB) 

a  e 

From  this  we  get 

(-l)Vdet(-BCD  -  eh)  =  (-l)V  det{-CDB  -  ah) 
a  e 

We  finally  get 

det{- BCD  -  eh)  =  (-1)”'*—  det{-CDB  -  ah) 
a  a”  e 

□ 

Proposition  64  Let  Ekk  >  0,  Xkxn-  Then  det(.YX^  -  A^/„)  =  0  implies  that 

det(X^£-‘X  -  X^h)  =  0 

This  is  a  corollary  to  Eaton’s  lemma  1.35  [74],  supplied  by  me. 

Proof.  This  is  a  simple  application  of  lemma  45  and  lemma  46. 

det(A:X^  -  A^Sfcfc)  =  det[(XX^S-‘  -  A^/fc)^] 

=  det(XX^E~'  -  AVfc)det(E) 

=  (-l)"-''A2*'det(A-2X^S-‘A''  -  /„)det(S) 
by  lemma  48.  This  equals 

(_l)n-k^2ik-n)  det(A^E-'A  -  AV„)det(S) 
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By  our  hypothesis,  this  is  zero.  If  we  assume  that  ^  0  and  E  >  0,  then  we 
conclude  that 


det{X'^'E-^X  -  X^h)  =  0 

□ 


Lemma  49  Let 


C 

D 


\ 


he  a  partitioned  square  matrix  such  that  A  ^  exists.  Then 


“ 

(  \ 

“ 

/  N 

“ 

A0/p  C®Ip 

A  C 

A  C 

det 

=  det 

®Ip 

= 

det 

B®Ip  D®Ip 

^B 

This  was  supplied  by  me. 


Proof. 


det 


A®  Ip  C®Ip 
B®Ip  D®Ip 
=  det(i4  ®  Ip)  det[D  ®  Ip-{B  ®  Ip){A  ®  Ip)~^{C  ®  Ip)] 


=  det(i4  ®  Ip)  det[Z?  ®  Ip  -  {B  ®  Ip){A  ^  ®  Ip){C  ®  /p)] 


=  det(  A  0  Ip)  det[D  ®  Ip  —  {BA  *  0  Ip){C  0  Ip)] 


=  det(A  0  Ip)  det[I>  ®  Ip  —  {BA  ^C)  0  /p)] 


=  det(A  0  Ip)  det{{D  -  BA'^C)  0  Ip)] 

=  [det  AY[dei{D  -  BA-^C)Y  =  [det(A)  det(Z?  -  BA-^C)Y 
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B  D 


AC  AC 

=  det  =  det  (gi  /„ 


B  D 


Note  that  even  though  the  determinants  are  equal,  the  matrices  are  not 


equal. 


A®B  C®L 


B®h  D®L 


A  C 


B  D 


Proposition  65  Let  A  be  a  square  matrix.  Then 


det(7  +  42)  =  |det(/  +  iA)|' 


Proof. 


det(/  4-  A^)  =  det[(/  +  iA){I  -  iA)] 


=  det(/  +  iA)  det(/  -  iA)  =  ldet(/  +  iA)\'^ 


Appendix  L 


GRAM-SCHMIDT 

We  examine  two  Gram-Schmidt  algorithms  which  give  different  results  in  the 
complex  case.  A  discovery  from  this  exercise  is  that  it  is  possible  via  Gram- 
Schmidt  to  produce  an  orthonormal  basis  for  a  vector  space  without  having  the 
property  of  an  inner  product  space.  The  first  algorithm  will  examine  the  case 
where  we  use  the  bilinear  operator  <  x,y  >=  x^y  which  is  an  inner  product 
operator.  The  second  algorithm  will  examine  the  bilinear  operator  (x,  y)  =  x^y 
with  does,  indeed,  produce  an  orthonormal  basis,  but  this  operator  is  not  an 
inner  product.  The  two  orthonormal  sets  produced  are  generally  not  the  same. 
The  basic  algorithm  is  given  as  problems  5.1.10  and  5.1.11  of  Stewart  [259]. 

L.l  Algorithm  Using  <  x^y  >=  x^y 

The  following  proof  is  set  in  general  Hilbert  space  i/,  which  is  a  complete  inner 
product  space.  In  our  application,  we  define  our  inner  product  as  <  x,y  >= 
x^y  where  x  and  y  are  vectors,  which  may  be  complex. 

Let  {xn}^Li  ®  of  linearly  independent  elements  (vectors)  in  Hilbert 
space  H.  Define  the  inner  product  using  the  engineering  convention  that  the 
inner  product  is  linear  in  the  second  argument.  For  example, 
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This  is  the  reverse  of  the  usual  mathematician’s  convention,  but  is  of  greater 
practical  use. 

The  following  Gram-Schmidt  orthonormalization  process  operates  on 
to  produce  the  orthonormal  set  which  has  the  same  span  as 

The  algorithm  is  as  follows. 

L.1.1  Inner  Product  Gram-Schmidt  Algorithm 


1  Let  v\  =  xi 


2 

Let  “1  -  jj^ 

||ui||  =<  ui,ui  R"*" 

n-l 

3 

Let  Vfi  —  Xfi  ^  Xji ,  u,  ^  u j 

«=i 

4 

“  IM 

||t^n|l  =<  Vn.Vn  R+ 

Repeat  steps  3  and  4  for  2  <  n  <  iV. 

L.1.2  Inner  Product  Gram-Schmidt  Algorithm  Proof 

First,  show 

<  U2,«i  >=  0  (L-1) 


=  C21  <  X2—  <  X2,Ui  >  Ui,Xi  > 

=  C21  {(X2,a:i)  -  (x2,ui)  («i,xi)} 
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Recall  that 


Vi  _  Xi 

Ikill  ll^ill 


=  C21  |(x2,Xl)  -  (X2,X,)  (xi,Xi)|  =  C21  {(x2,Xl)  -  (x2,Xl)}  =  0 


Therefore  <  U2,ui  >=  0. 
Now,  show 


<  Ul,Ui  >=  1 


(L.2) 


1,  *■ = j 

For  hj  <  —  1,  assume  <  u,,  Uj  >=  .  Then 

0,  i  ^  j 


■  ^ ^  ||u  11  ^  ,  Uj 

=  ]i^{<®n,Uj  >  -^{x„,U,){u,-,Uj)| 


{<  X„,Uj  >  (Xn,Uj)}  —  0 


Thus  <  Un,Uj  >=  0. 
Now,  show 


<C  UjijUji  ^ —  1 


/  t;„  v„ 

<  Ujii  (  II  II  )  II  I 

\  kn  V„ 


(L.3) 


Therefore  by  induction,  the  algorithm  produces  an  orthonormal  set  {u„}  for 
all  n  €  Z.  For  the  inner  product,  we  see  that  the  orthonormal  set  is  unitary 


also. 


L.2  Algorithm  Using  (x^y)  =  x^y. 
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The  Gram-Schmidt  process  applied  to  a  complex  vector  space  C"  usually 
involves  using  the  inner  product.  However,  the  inner  product  is  not  the  only 
function  of  the  two  vectors  that  will  produce  a  decomposition.  For  example, 
define  {x,y)  =  x^y  where  x,y  E  C”.  (x^y)  does  not  define  an  inner  product. 
We  cannot  assert  that  (a;,  x)  >  0  for  all  x  6  C".  However,  the  function  (x,  y)  = 
x^y  can  successfully  be  used  in  the  Gram-Schmidt  process  to  produce  an 
orthonormal  basis.  In  fact,  let  (x,  y)  be  any  operator  such  that 

{ax,y)  -  oi{x,y)  =  {x,ay) 

(x -f- y,  ^)  =  (x,  -f  (y, 

where  x,y,2  €:  C"  and  a  €  C.  The  algorithm  is  the  “same”  as  before. 

L.2.1  Bilinear  Gram-Schmidt  Algorithm 

Let  {x,}j  be  a  set  of  linearly  independent  vectors  in  C".  The  algorithm  is  as 
follows. 

1  Let  V\  =  xi 

2  Let  m  =  (ui,vi)’/^eC 

k-\ 

3  Let  Vk  =  Xk—  E  <  Xfc,  w,  >  u, 

i=l 

4  Let  Uk  =  €  C 

Repeat  steps  3  and  4  for  2  <  ^  <  n. 
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L.2.2  Proof 

The  proof  closely  follows  the  one  using  the  inner  product.  First,  show 

(«2,«i)  =  0  (L.4) 

=  C21  (12  -  {3:2,Ui)uuXi)  =  C21  {(jr2,a-i)  -  (x2, Ui)(ui, Xj)} 

Recall  that 

-  ^’1  _  Ji 

Then  we  obtain 

(«2,  «i)  =  C21  I  (j-2,  J  i)  -  r 

=  C21  {(j’2,-ri)  -  (•1-2, -Ti)}  =  0 

Therefore 

(fi2,wi)  =  0  (L.5) 

Now,  show  (lij,  lij)  =  I. 

(«l,Wl)  ((i;,,r,)l/2'  (p,,!,,)!/?)  ^ 


783 


For  i,j<n—l,  assume 


—  Sij  — 


1,  i=j 

0,  i  ^  j 


Then 


[Un,Uj)  —  (  .  (xn  ^  (a^TH  I  5  ) 

\(Un,’^n)  '  \  i=l  )  / 

"  {(^n,U,)  -  E  (X„,U,)  (U.-,U,)| 


(yn,Vn)^/^ 

Thus  (u„,Uj)  =  0  when  j  ^  n. 
Now,  show 


{(^ni  ^j)}  ^ 


(u„,u„)  =  1 


<  ^  (  Vn  \  _ 


Therefore  by  induction,  the  algorithm  produces  an  orthonormal  set  {u„}  for 
all  n  €  Z+.  Compared  to  the  previous  Gram-Schmidt  algorithm,  note  that 


<  Vk,Vk 


(L.7) 


Thus  the  two  orthonormal  sets  are  not  generally  the  same. 


Appendix  M 
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COMPLEX  MATRIX  DECOMPOSITIONS 
AND  EIGENVALUES 


This  contains  various  decompositions  of  complex  matrices.  Most  decompo¬ 
sitions  are  related  to  the  eigenvalue  decomposition  of  an  Hermitian  positive 
definite  matrix.  Also  included  are  some  decompositions  of  triangular  and  rect¬ 
angular  matrices.  Some  of  the  proofs  are  given  as  algorithms  or  constructions. 
There  are  also  a  number  of  theorems  that  describe  or  exploit  properties  of 
eigenvalues. 

Most  of  the  theorems  are  straight-forward  adaptations  of  similar  theorems 
for  the  case  of  real  matrices.  In  generalizing,  special  attention  is  required  when 
the  real  case  specifies  uniqueness  to  ±1.  In  the  complex  case,  this  sometimes 
will  generalize  to  e'®  for  arbitrary  ^  G  R.  Distinction  is  also  required  between 
sj'mmetric  and  Hermitian  complex  matrices.  It  is  shown,  for  example,  that 
you  cannot  assume  a  symmetric  complex  matrix  is  a  definite  matrix,  or  even 
that  its  eigenvalues  are  all  real.  Recall  that  the  present  literature  about  zonal 
polynomials  for  complex  matrices  assume  complex  symmetric  matrices. 

The  decompositions  are  needed  to  support  the  work  on  Jacobians  and  the 
development  of  distributional  results. 
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M.l  Decomposition  to  a  Product  of  a  Trian¬ 
gular  Matrix 

Proposition  66  Let  T  be  an  n  x  n  complex  upper  triangular  matrix  with 
distinct  diagonal  elements.  Let  =  diag(Tii,  •  •  • ,  Tatat),  Then  there  exists  a 
nonsingular  upper  triangular  matrix  C  satisfying  CT  =  A^C.  C  is  uniquely 
determined  up  to  a  (possibly  different)  multiplicative  constant  for  each  row. 
Row  k  of  C  is  the  left  eigenvector  of  T  corresponding  to  eigenvalue  Xf  This 
is  a  complexification  of  Takemura  lemma  3.1.1  (p.  17) [265],  stated  without 
proof. 


Proof.  The  expansion  of  the  identity  is  CT  = 


C„T„ 

CuTu  +  C 

12T22 

C'liT’ia  +  C12T23 

+  C\zT^  •  •  • 

E  C.kTkn 

k=l 

0 

C22T22 

C22 

T23  +  C23T33 

E  CxkTkn 

k=2 

0 

0 

C33T33 

•  •  • 

E  CxkTkn 

k=3 

0 

0 

0 

C  T 

A?C„ 

1  AjC'i2 

A?C,3 

••• 

0 

A2C22 

A2C'23 

•••  A2C2n 

A^C^ 

0 

0 

XIC33 

XlCz. 

0 

0 

0 

A2C„„ 
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Table  M.l.  CT  =  D^C  Decomposition  Pseudo-Code 


c 

Increment  by  rows 

DO  a  J=1,N-1 

c 

Increment  by  columns 

..DO  /3  M=J+1,N 

c 

Compute  CjM 

....SUM=0.0 

II 

O 

Q 

7 

. SUM=SUM+C(J,K)*T(K,M) 

/? 

....C(J,M)=SUM/{T(J,J)-T{M,M)) 

a 

..CONTINUE 

By  construction,  XJ.  =  T^k  for  1  <  A:  <  n.  Each  row  of  C  may  be  determined 
independently  from  all  other  rows.  The  algorithm  is  given  in  table  M.l 
This  computes 

m— 1 

H  Cjh  Th  M 

CjM  =  - - - 

ijj  —  1mm 

for  1  <  c/  <  —  1  and  J  +  1  ^  M  <  N,  where  we  used  Xj  =  Tjj  in  the 

derivation.  The  values  of  Cjj  are  arbitrary,  and  Cjj  is  independent  of  Ckk  for 
J  ^  K.  For  a  fixed  diagonal  of  C,  the  other  entries  of  C  are  unique.  The  {Cjj) 
n  ay  be  chosen  to  minimize  numerical  error  in  computations.  Alternately,  the 
-imputations  may  be  slightly  simplified  by  arbitrarily  setting  Cjj  =  1  for  all 
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J.  □ 


Lemma  50  Let  T  be  an  n  y.  n  complex  upper  triangular  matrix  with  distinct 
diagonal  elements.  Let 

=  diag(rii,---,T„„) 

Then  there  exists  a  nonsingular  upper  triangular  matrix  C  satisfying  CT  = 
C\^.  C  is  uniquely  determined  up  to  a  (possibly  different)  multiplicative  con¬ 
stant  for  each  column.  Column  k  of  C  is  the  right  eigenvector  ofT  correspond¬ 
ing  to  eigenvalue  Xj..  This  is  a  corollary  to  the  complexification  of  Takemura 
lemma  3.1.1  [265]. 


Proof.  Examine  the  structure  of  the  following  matrices. 

Cn\]  Cx2Xl  C,3A|  ••• 

0  C22>\  •••  C2nXl 

0  0  C^Xl  ••• 

0  0  0  C„„A2 

We  also  have  TC  = 

TuCn  TnCi2  +  T12C22  TnCis -\- T\2C23  +  T13C33  •••  ^  TikCkn 

fc=i 

0  T22C22  T22C23  +  T23C33  •  •  •  T2kCkn 

k=2 

% 

0  0  T33C33  •  •  •  T^kCkn 

k=3 

0  0  0  Tr^nCnn 


Table  M.2.  CT  =  CD^  Decomposition  Pseudo-Code 
C  Increment  by  columns 

DO  o  M=2,N 

C  Increment  by  rows 

..DO  ^ 

C  Compute  CjM 
....SUM=0.0 
....DO  7  K=J+1,M 

7  . SUM=SUM-|-T(J,K)*C(K,M) 

^  ....C(J,M)=SUM/(T(M,M)-T(J,J)) 

a  ..CONTINUE 


By  construction,  Xl  =  Tkk  ior  I  <  k  <  n  and  the  values  of  Ckk  are  arbitrary. 


Construction  of  Cjk  proceeds  one  column  at  a  time  from  left  to  right,  working 


from  the  diagonal  to  the  top  row.  The  algorithm  is  given  in  table  M.2. 


This  computes 

m 

12  TjkCkm 

^  _  fc— _ _ 

^JM  fjt  rp 


for  2  <  M  <  iV  and  M  -  1  >  J  >  1.  The  values  of  Cjj  are  arbitrary,  and  Cjj 


is  independent  of  Ckk  for  J  K-  For  a  fixed  diagonal  of  C,  the  other  entries 


of  C  are  unique. 


Proposition  67  Let  A  be  an  m  x  n  complex  matrix.  Then  there  exists  an 
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upper  n  X  n  triangular  matrix  T  having  positive  real  elements  on  the  diagonal, 
and  a  subunitary  m  xn  matrix  S  such  that  S^S  =  /„,  and  A  =  ST.  This  is  C. 
R.  Rao  equation  lb.2(ix)  [213].  This  is  one  version  of  a  QR  decomposition. 


Proof.  This  proof  is  by  C.  R.  Rao.  Let  (ai, •  •  • , «„)  be  the  columns  of  A. 
Let  (<7i,  •  •  • ,  <T„)  be  the  columns  of  S.  Let  (/j,,  •  •  • ,  ta)  be  the  nonzero  elements 
of  the  column  of  T.  Then  we  have 


(O]  ,  Ol2,  '  ’  ■  )  ^n)  (^1 5  ’  ■  "  »  ^n) 


I  \ 

^11  ^12  •••  ^In 

^22  •  •  •  I'ln 


—  (<Tltn»0'l^l2  +  <^2^22?  •  *  •  5  +  <7'2^2n  +  ’  ’  *  +  *^n^nn  ) 


From  this,  we  see 


O'!  =  <7itii,Q2  =  (Xlti2  +  er2t22i  '  '  '  ,  Oi  =  CTj  t  ^  +  Cr2t2i  +  '  '  ’  +  0-,L, 


for  1  <  I  <  n.  Let  cr-^crj  =  6,j.  Then  (tRoj  =  Lj  for  i  /  j,  and 


+  tli 


/  _  /2  _  .  .  .  _  /2  .1 
*11  —  «1  *||  *2i  *1-1, iJ 


1/2 


which  implies 
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beginning  with  qi,  then  S  and  T  are  constructed  as  follows. 

<12  = 

<22  =  [a^a2  - 
^2  =  7j;[«2  -  (Titi2] 

tij  =  a-f^otj^i  6  [l,j  —  1]  G  N 

tii  =  laftti  -  tl  -tl - (J., ■  j  e  |2,  n]  €  N 

(Tj  =  ~  ~  <^2^2i  —  ■  ■  ■  —  j] 

Thus  we  have  constructed  the  required  S  and  T  such  that  A  =  ST,  T  isnxn 
upper  triangular  with  a  positive  real  diagonal,  and  S  =  In- 

Proposition  68  Let  A  be  an  m  x  n  complex  matrix.  Then  there  exists  a 
subunitary  matrix  S  of  size  m  x  n,  where  S  =  In,  an  n  x  n  nonsingular 
upper  triangular  matrix  C,  and  a  diagonal  matrix  such  that  AC  =  SC\^ 
or  A  =  SC A^C-\ 

Proof.  By  C.  R.  Rao  equation  lb.2(ix)  [213],  we  obtain  an  m  x  n  subunitary 
matrix  S  where  S^  S  =  /„,  and  an  upper  n  x  n  triangular  matrix  T  with 
positive  real  diagonal  elements  such  that  A  =  ST.  By  lemma  50,  we  have  a 
nonsingular  upper  triangular  matrix  C  satisfying  TC  =  CA^,  where  column 
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fc  of  C  is  the  right  eigenvector  of  T  corresponding  to  eigenvalue  A^.  Thus 
AC  =  STC  =  SCA\  □ 

Proposition  69  Let  A  be  an  m  x  n  complex  matrix.  Then  there  exists  an 
upper  triangular  mxm  matrix  T  having  positive  real  elements  on  the  diagonal, 
and  a  subunitary  m  x  n  matrix  S  such  that  SS^  =  Im  and  A  =  TS.  This  is 
motivated  by  C.  R.  Rao  equation  lb.2(ix)  [213].  This  is  a  version  of  a  QR 
decomposition,  except  that  the  orthonormal  basis  matrix  is  now  on  the  right 
side,  and  it  is  the  set  of  rows  of  S  that  form  the  basis. 

Proof.  I  have  followed  the  proof  is  almost  exactly  as  C.  R.  Rao’s  proof  in 
proposition  67.  Let  be  the  rows  of  A.  Let  (<Ti, •  •  •  ,(T,„)  be  the 

rows  of  S.  Let  (ti,-,  •  •  • ,  t,m)  be  the  P*  row  of  T.  Then  we  have 


/  ^ 

/  ^ 

( 

til  hi  •••  tlm 

«2 

= 

t22  •  •  •  ^2m 

<72 

^  Imm  ! 

\  / 
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Thus,  we  see  that  the  rows  of  A  are 


—  tmm^m 


^m—1  —  A 


"h  ^«,»+l^i+l  "h  ■  '  '  “h  tim^n 


ai  —  +  ti2(T2  +  •  •  •  +  timCT„ 


The  proof  is  by  construction.  We  constrain  the  construction  by  requiring 


a-.  <7j  = 


1,  for  i  =  j 
0,  for  i  7^  j 


Given  this  constraint,  we  construct  the  T  that  satisfies  A  =  TS,  and  in  the 
process  we  also  explicitly  find  S.  To  begin  with,  notice  that  from  the  expansion 
of  o,  that  if  we  post-multiply  by  where  ^  ^  j,  we  get 


OiCrf  =  t..<T.Crf  +  ti^i+lCTi+ia^  +  •■■  +  tim<Tm<^f  =  Uj 


Also  note  that 


CXiOt^  —  “I"  ■  ’  *  "h  “t”  '  '  '  “I" 


-  +  ^h+l  + - 


Solving  the  above  for  t,j  and  ta  we  get  Uj  =  a,cr^  for  i  ^  j  and 


tii  =  [a.af  -  - 


i 
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To  see  that  T  and  S  can  actually  be  constructed  by  decomposing  A,  we  solve 
iteratively,  as  follows.  We  begin  by  recalling  which  leads  us  to 

the  first  step  in  the  iteration. 


—  ^mm^m 


tmm  = 


In  general, 


_L 

^mm 


^m— 1  “  ^m— l^m— l^m— 1  "1“  l,m^m 
^m— l,m  —  — 

1  ~  tm-i  m-i 

®m— 2  ~  ^m— 2,m— 2^m— 2  "t"  ^m— 2,m— l^m— 1  "I"  ^m— 2,m^m 
tm~2,m  ~  Olm—2^rn 
fm— 2,m  — 1  ~  2^ni— 1 

tm-2,m-2  —  [<^m-2<^Tn-2  ~  ^m-2,m-l  ~  ^m-2,m]  ^ 

^m—2  ~  1  I  r[®TO— 2  ^m— 2,m— l^m  — 1  fm— 2,m^m] 

^m— 2,ni— J 


tik^Qi(T{f^  A:  =  m,m  —  I,  -  •  • ,  7*  +  1 

i„ = (»(«,''  - 


>  i  =  m  —  1,  m  —  2,  •  ■  • ,  1 


Thus,  we  have  constructed  T  and  S  where  SS^  =  Im  and  A  =  TS.  □ 
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Proposition  70  Let  A  be  an  mxn  complex  matrix.  Then  there  exists  a  lower 
triangular  mxm  matrix  T  having  positive  real  elements  on  the  diagonal,  and  a 
subunitary  mxn  matrix  S  such  that  SS^  =  Im  and  A  =  TS.  This  is  motivated 
by  C.  R.  Rao  equation  lb.2(ix)  [213].  This  is  a  form  of  a  QR  decomposition, 
except  that  the  orthonormal  basis  is  the  set  of  rows  of  S,  and  it  is  on  the  right 
side. 

Proof.  I  have  followed  the  proof  almost  exactly  as  C.  R.  Rao’s  proof  in 
proposition  67.  Let  be  the  rows  of  A.  Let  (ctj,  •  •  •  ,£r^)  be  the 

rows  of  5.  Let  (tj,,  •  •  • ,  in)  be  the  P*  row  of  T.  Then  we  have 


Thus,  we  see  that  the  rows  of  A  are 

Oj  =  tiiCTi 
=  ^210’]  +  ^22<^2 

Q,  =  tiiai  +  <,,2<^2  +  •  •  •  +  tii(Ti 

Om  ~  Iml^i  T  tm2^2  T  ’  '  "  T  tmm^m 


795 


Let  (Ticr^  =  8ij.  Given  this  constraint,  we  get  a,<T^  =  for  1  <  j  <  i  —  1  and 
i  ^  j.  Also  note  that 

aiaf  =  H - !■ 

where  1  <  i  <  m.  Now,  solve  for  tij  and 
tij  =  Qiaf  for  i  ^  j 

L.  =  - 

Now,  form  the  algorithm  to  construct  T  and  S. 

hi  =  [ciiaYY''^ 

hi  =  o!2<yY 
<22  =  [02a"  - 

<^2  =  -  hl<ri] 

tij  =  Qicrf ,  1  <j  <i-l 

hi  =  -tli-th - ^h-iY^^ 

(Ti  =  j-[ai  -  tii<Ti  -  ti2(72 - hi-l<^i-l] 

*11 

Thus,  A  =  TS  where  SS^  =  Im  and  T  is  lower  triangular  with  positive  real 
elements  on  the  diagonal,  □ 


Proposition  71  Let  A  be  an  mxn  complex  matrix.  Then  there  exists  a  lower 
triangular  nxn  matrix  T  having  positive  real  elements  on  the  diagonal,  and  a 
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subunitary  mxn  matrix  S  such  that  S^S  =  /„  and  A  =  ST.  This  is  motivated 
by  C.  R.  Rao  equation  lb.2(ix)  [213].  This  is  a  form  of  a  QR  decomposition. 


Proof.  I  have  followed  the  proof  almost  exactly  as  C.  R.  Rao’s  proof  in 
proposition  67.  Let  (oi,  •  •  • ,  «„)  be  the  columns  of  A.  Let  (ctj,  •  •  • ,  cr„)  be  the 
columns  of  S.  Let  (t,,,  •  •  • ,  <„,)  be  the  nonzero  elements  of  the  column  of  T. 
Then  we  have 


{q!|  ,  02)  *  ■  *  )  (^1  y  ^2)  *  *  '  )  ^n) 


til 


<21  <22 


<nl  <n2  '  ■  '  <r 


=  (<7’l<U  +  •  •  •  +  0’n<nl )  Cr2<22  +  '  '  ’  +  0„<„2)  '  '  '  i^nt-nn) 


Let  CF^CTj  =  8ij.  Then  cr-^Qj  =  <,j  for  i  ^  j,  and 


0^1  -  <^.  +  <i+i,i  H - h  < 


2 

ni 


Thus  tij  =  (T-^Qj  and 


<„  =  [o<^o,  -  tf 


i+t.i 
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Now,  write  the  algorithm  to  construct  S  and  T. 

i.n  = 

I^n  =  r*-ai. 

Inn 

^n,n— 1  ~  ^n—1 

^n—l  —  „_j  „_j  [<^n— I  ^^n^n.n— l] 


tij  =  for  i  j 

Ui  =  [afa.  -  - tU^'^  ' 

~  ^«+l^i+l,«  '  '  '  ^n^n«] 


1  <  j-  <  „  _  1 

m  reverse  order 


Thus  we  have  found  lower  triangular  T  and  S  subunitary  such  that  A  =  ST. 


□ 


M.2  Similarity  Transformation 

Lemma  51  Let  B  be  a  nonsingular  complex  n  x  n  matrix,  and  let  A  be  any 
other  complex  n  x  n  matrix.  Then  A  and  B~^AB  have  the  same  eigenvalues. 

Proof. 

det(5-MB  -  A^/)  =  det(B-MB  -  X^B-^IB) 

=  det[5-‘(A  -  X^I)B]  =  det(B-*)  det(A  -  A*/)  det(B) 

=  det(yl  -  X^I)  det(B-'B)  =  det(yl  -  X^I) 
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when  B  is  nonsingular.  □ 

Theorem  112  Let  s^t  €  where  T  is  R  or  C.  Let  A^(-)  be  the  set  of 

eigenvalues  of  its  argument.  Then 

X^ist)  = 

Proof.  I  do  not  have  a  record  of  the  pedigree  of  this  theorem  or  its  proof. 
By  similarity  transformation,  using  lemma  51. 

X^{st)  =  X^{s~^sts)  =  X^{ts) 

a 

Corollary  32  Let  U  be  a  unitary  complex  n  x  n  matrix,  and  let  A  be  any 
other  complex  n  x  n  matrix.  Then  A  and  AU  have  the  same  eigenvalues. 

Proof.  Although  I  provided  this,  it  is  also  common  knowledge.  It  is  pro¬ 
vided  here  for  the  sake  of  completeness.  =  U^.  Apply  theorem  112  and 
the  result  follows  immediately.  A  longer  proof  follows  here. 

detiU^AU  -  X^l)  =  deX(U^AU  -  X^U^ HI) 

=  det[U"{A  -  X^I)U]  =  det(f/")det(A-  A’^/)det(t/)  =  det{A  -  X^I) 
Recall  that  det(f/)  =  e'®  by  lemma  43,  and 

det(f/")det(f/)  =  I 

□ 
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Corollary  33  Let  V  be  an  orthonormal  complex  matrix  such  that  V^V  —  7, 
and  let  A  be  any  other  n  x  n  complex  matrix.  Then  A  and  V^AV  have  the 
same  eigenvalues. 

Proof.  Although  I  provided  this,  it  is  also  common  knowledge.  It  is  pro¬ 
vided  here  for  the  sake  of  completeness.  .  Apply  theorem  112.  The 

result  follows  immediately.  An  alternate  proof  follows. 

deiiV'^AV  -  \^I)  =  AeiiV'^AV  -  X^V'^IV) 

=  det[V^{A  -  =  det(P^)det(A  -  AV)  det(P)  =  det(A  -  A"/) 

□ 

M.3  Transformation  to  a  Triangular  Matrix 
with  the  Same  Eigenvalues 

Lemma  52  Let  A  G  Then  there  exists  a  unitary  matrix  (J  such  that 

AU  is  an  xipper  triangular  matrix  whose  diagonal  elements  are  the  eigen¬ 
values  of  A.  This  is  a  complexification  of  Muirhead’s  theorem  A9.1  [187]. 

Proof.  This  is  a  complexification  of  Muirhcad’s  proof.  Let  Aj,  --,A^  be 
the  eigenvalues  of  A,  and  let  x\  be  an  eigenvector  of  A  corresponding  to  Aj. 
Let  X2,  -  ■  ■  ,Xm  be  any  other  vectors  such  that  .Vi,X2,  -  ■  ■ ,  Xm  form  a  basis  for 
O'".  Using  the  inner-product  Gram-Schmidt  orthonormalization  process  given 
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in  section  L.l,  construct  from  ,  X2,  ,Xm  an  orthonormal  basis  given  as  the 

columns  of  the  unitary  matrix  U\,  where  the  first  column  uj  is  proportional 
to  a:i,  so  that  tti  is  also  an  eigenvector  of  A  corresponding  to  Aj.  Then  the 
first  column  of  AUi  is  Aui  =  AjUi,  and  hence  the  first  column  of  AU\  is 
Since  this  is  the  first  column  of  \\UiU\  =  Aj/^,  it  is 


0 


Hence 


^  A? 


U^AUi  = 

\0  A2  ) 

where  ^42  is  (m  —  1)  x  (m  —  1).  By  lemma  43,  det(f/i)  =  e‘®.  Thus 


det{Ul^AUi  -  X^Im)  =  dei{U(^AUi  -  X^U(^ IM 


=  det(t/j^)  det(v4  —  X^ Im)  det{Ui)  =  det(A  —  X^Im) 


det 


V 


=  {A?-A2)det(/l2-AV,„_i) 


/ 


X]  -  A2  Bt 

0  /I2  -  X-^Im-l 

Since  A  and  AU\  have  the  same  eigenvalues,  then  the  eigenvalues  of  i42  are 


Now,  using  a  construction  similar  to  that  above,  we  want  to  find  an  or¬ 
thonormal  (m  —  1)  X  (m  —  1)  matrix  U2  whose  first  column  is  an  eigenvector 


of  A2  corresponding  to  A^.  Then 


where  ^3  is  (m  —  2)  x  (m  —  2)  with  eigenvalues  A3,  •  •  • ,  A^. 

Repeating  this  procedure  an  additional  m  —  3  times  we  now  define  the 
orthonormal  matrix 


Note  that  AU  is  upper  triangular  with  diagonal  elements  equal  to  Aj,  •  •  • ,  A^ 
□ 

Lemma  53  Let  A  €  Then  there  exists  an  orthonormal  matrix  V  such 

that  V^AV  is  an  upper  triangular  matrix  whose  diagonal  elements  are  the 
eigenvalues  of  A.  This  is  a  corollary  to  a  complexification  of  Muirhead’s  the¬ 
orem  A9.1  [187]. 

Proof.  This  is  a  complexification  and  adaptation  of  Muirhead’s  proof  of 
his  theorem  A9.1.  Note  that  even  though  a  transpose  is  in  the  problem,  this 
is  still  different  from  the  real  case. 

Let  A] ,  •  •  • ,  A^  be  the  eigenvalues  of  A,  and  let  xi  be  an  eigenvector  of  A  cor¬ 
responding  to  A^ .  Let  X2,  -  ■  •  ,Xm  be  any  other  vectors  such  that  xj ,  12,  •  •  • , im 
form  a  basis  for  C"*.  Using  the  bilinear  Gram-Schmidt  orthonormalization  pro¬ 
cess  given  in  section  L.2,  construct  from  xi,X2,  -  ■  ■  ,Xm  an  orthonormal  basis 
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given  as  the  columns  of  the  orthonormal  complex  matrix  Vi,  where  the  first 
column  vi  is  proportional  to  Xi,  so  that  v\  is  also  an  eigenvector  of  A  corre¬ 
sponding  to  Aj.  Then  the  first  column  of  AVy  is  Avi  =  Ai^i,  and  hence  the  first 
column  of  AVi  is  Since  this  is  the  first  column  of  =  \\Im, 


it  is 


0 


Hence 


V^AV,  = 


\ 


) 


0  A2 

where  A2  is  (m  —  1)  x  (m  —  1).  By  lemma  44,  det(Vi)  =  ±1.  Thus 


det(Vi^ATx  -  A2/„,)  =  det(V,^/lTi  - 


=  det(l/i^)det(/l  -  A2/,„)det(Ti)  =  det(^  -  \^lm) 


=  det 


V 


=  (A2-A2)det(/i2-A2/,„_i) 


/ 


A?  -  A2  Bx 
0  A2-  A^/^-i 

Since  A  and  Vi^AVi  have  the  same  eigenvalues,  then  the  eigenvalues  of  >42  are 

Now,  using  a  construction  similar  to  that  above,  we  want  to  find  an  or¬ 
thonormal  (m— l)x(m  —  1)  matrix  V2  whose  first  column  is  an  eigenvector 
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of  A2  corresponding  to  Aj.  Then 


where  A3  is  (m  —  2)  x  (m  —  2)  with  eigenvalues  A|,  ■  •  • ,  A^. 

Repeating  this  procedure  an  additional  m  —  3  times,  we  now  define  the 
orthonormal  matrix 


Note  that  V^AV  is  upper  triangular  with  diagonal  elements  equal  to  Aj ,  •  ■  • ,  A^ 
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Note  that  the  nature  of  the  was  not  at  issue  here.  They  are  not  necessar¬ 
ily  real.  Also,  compared  to  lemma  52,  V  ^  U  in  general,  even  though  the 
eigenvalues  are  the  same  in  both  cases.  □ 

M.4  Functions  of  Eigenvalues 

Theorem  113  Let  A  be  annxn  complex  matrix  with  eigenvalues  Af ,  •  •  • ,  A^. 

n 

Then  tr(A)  =  A?.  This  is  a  complexification  of  Graybill  theorem  9.1.3  [95]. 

1  =  1 

Proof.  By  lemma  52,  there  exists  a  unitary  matrix  U  such  that  U^AU  is 
an  upper  triangular  matrix  whose  diagonal  elements  are  the  eigenvalues  of  A. 
Call  it  T.  By  property  of  the  trace  function  ir{AB)  =  tr(BA),  we  see  that 

tr(A)  =  tr(A/)  =  tr{AUU^)  =  ir {U " AU)  =  tr(T)  =  ^  A? 

t=i 

□ 

Theorem  114  Let  A  be  an  n  x  n  complex  matrix  with  eigenvalues  A^,  •  •  • ,  A^. 

d€t(A) = n  A? 

i=l 


Then 
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Proof.  By  lemma  52,  there  exists  a  unitary  matrix  U  such  that  U^AU  is 
an  upper  triangular  matrix  whose  diagonal  elements  are  the  eigenvalues  of  A. 
Call  it  T.  By  lemma  43,  det(C/)  =  e‘®.  Thus 

det(/)  =  det{U^U)  =  1 

and 

det(yl)  =  det(f/")det(>l)det(C/)  =  det{U^  AU) 

Since  U  and  A  are  conformable  square  matrices.  Thus 

det{A)  =  det(T)  =  H 

»=i 

□ 

M.5  Eigenvalue  Decomposition 

Theorem  115  (Very  Important  j.  If  A  is  an  Hermitian  m  x  m  matrix  with 
eigenvalues  Aj,  •  •  • ,  A^,  then  there  exists  a  unitary  matrix  U  such  that 

U»AU  =  D  =  d\e.g{Xl---,Xl)  =  A^ 

If  U  =  [C/i,  •  •  • ,  I7m],  then  Ui  is  an  eigenvector  of  A  corresponding  to  the 
eigenvalue  Xj.  Moreover,  if  A],---,A^  are  all  distinct,  then  the  representa¬ 
tion  U^AU  =  D  is  unique  up  to  phase  changes  in  the  first  row  of  U.  This  is 
a  complexification  of  Muirhead’s  theorem  A9.2  [187]. 
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Proof.  This  is  a  complexification  of  Muirhead’s  proof.  From  lemma  52, 
there  exists  a  unitary  matrix  U\  such  that 

A?  5/ 

U^AU,  = 

^0  A2  ^ 

where  A^,  •  ■  • ,  A^  are  the  eigenvalues  of  A2.  A^  =  A  implies 


{U^AU^f  =  U^A^Ui  =  U^AU^ 


is  also  Hermitian.  thus,  Bi  is  a  zero  matrix,  B\  =  0.  Similarly,  each  Bi  in 
the  proof  of  lemma  52  is  zero  (i  —  1,  •  •  • ,  m  —  1).  Thus,  U  given  in  lemma  52 
satisfies 

Observe  that  UU^AU  =  AU  —  UA^.  Consequently  AUi  =  f/,A?  so  that  Ui  is 
an  eigenvector  of  A  corresponding  to  the  eigenvalue  A^. 

Now,  suppose  that  we  also  have  Q^AQ  =  Z?  for  a  unitary  matrix  Q.  Let 
P  =  Q^U.  Then 

PD  =  {Q^U){U^AU)  =  Q^AU 

and 

DP  =  {Q^AQ){Qf^U)  =  Q^AU 

thus  PD  =  DP.  li  P  =  {pij),  it  follows  that  PijXj  =  PijA^.  Since  Xf  ^  A]  by 
hypothesis,  pij  =  0  for  all  i  /  j.  Note  that 


pffp  =  U^QQ^U  =  I 
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Since  P  is  unitary  and  diagonal,  it  must  have  the  form  P  =  diag(e'®’ ,  •  •  • ,  e*®"*). 


Thus 


t/  =  gp  =  gdiag(e‘®>,-.,e“'-) 


Note  that  when  working  with  A  6  this  last  property  simplifies  to 

saying  that  if  AJ,  •  •  • ,  are  all  distinct,  then  the  representation  AU  =  D 
is  unique  up  to  sign  changes  in  the  first  row  of  U.  □ 

Caution.  Not  all  eigenvalue  decompositions  can  be  written  in  the  form  of 
A  =  QA'^Q^ .  The  conditions  on  our  theorem  requiring  A  to  be  Hermitian  give 
us  the  form  we  are  familiar  with.  When  A  is  not  Hermitian,  we  get  the  form 
A  =  QA'^Q-\ 


Example.  Let 


A  = 


^  -i' 


2  1 


Then 


1 

\ 

(  \ 

/  \ 

4 

-1 

1 

=  2 

1 

.2 

'  / 

.2, 

and 


/  A 

/  A 

(  A 

4  -1 

1 

1 

=  3 

.2  1  y 

Thus,  the  eigenvalues  are  {2,3}  with  associated  nonnormalized  eigenvectors 
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The  normalized  eigenvectors  are 


Then  we  observe 


\A75 

•fn 


yi/s  I  (  2  0  1 1  yjlli  ^ 

^  sjlyiTi  yi/i 


2  1  3  2v5  ,  3 

5  2  5  2 

2^  I  3  4  1  3 

5  '2  5  '  2 


4  -1 


2  1 


Thus  we  do  not  get  back  A  when  we  compute  QA^Q^.  However,  noting  that 


Q-^  = 


-\/5  n/5 

2v^  -y/2 


we  find  that  A  =  QA^Q  * .  □ 


Theorem  116  Let  AJ  =  i4  €  M„(C)  have  eigenvalues  Aj,  •  •  • ,  A^.  Then  there 
exists  a  matrix  V  such  that  V^V  =  I  and 


l/MV  =  D  =  diag(Aj,...,A^)  = 
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If  V  =  [Vi,--',Kn],  then  V;  is  an  eigenvector  of  A  corresponding  to  the 
eigenvalue  Moreover,  if  Aj ,  •  •  • ,  are  all  distinct,  then  the  representa¬ 
tion  AV  =  D  is  unique  up  to  sign  changes  in  the  first  row  of  V.  This  is 
a  corollary  to  a  complexification  of  Muirhead’s  very  important  theorem  A9.2 
[187], 

Proof.  This  is  a  complexification  of  Muirhead’s  proof  of  his  theorem  A9.2. 
From  lemma  53,  there  is  an  orthonormal  m  x  m  matrix  Vj  such  that 

A?  B,  ^ 

0  A2  ! 

where  A^,  •  •  • ,  A^  are  the  eigenvalues  of  i42.  A^  =  A  implies 

[V^AVif  =  V^A^Vi  =  V^AV^ 

Thus  B\  is  a  zero  matrix.  Similarly,  each  B,  in  the  proof  of  lemma  53  is  zero 
for  1  <  i  <  n  —  1.  Thus  the  V  of  lemma  53  satisfies 

P^AP  =  diag(A?,---,A^)  = 

Now,  V^V  =  /  implies  which  in  turn  implies  VV^  =  I.  So, 

VV'^AV  =  AV  =  VA^ 

Consequently,  AVi  =  ViXj,  so  that  K  is  an  eigenvector  of  A  corresponding  to 
the  eigenvalue  A^.  Also,  suppose  Q^AQ  =  D  for  Q^Q  =  /.  Let  P  =  Q^V. 
Then 

PD  =  (g^P)(P^AV)  =  Q^AV 
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and 

DP  =  {Q'^  AQ){Q'^V)  =  Q'^AV 

Thus  PD  =  DP.  Let  P  =  (p,j).  Then  If  A?  ^  Xj,  then  pij  =  0 

for  all  i  ^  j.  Note  that 

=  / 

Since  P  is  orthonormal  and  diagonal, 

P  =  diag(±l,±l,---,±l) 

Thus 

V  =  QP  =  Q  diag(±l,  ±1,  •  •  • ,  ±1) 

□ 

M,6  Hermitian  Definiteness 

Corollary  34  // A  is  an  Hermitian  mxm  matrix  with  eigenvalues  Af,  •  •  • ,  A^. 
then  A?  €  R  for  all  i  €  This  is  a  corollary  to  a  complexification  of 

Muirhead’s  theorem  A9.2  [187].  It  is  a  widely  known  result. 

Proof.  From  theorem  115,  we  know  there  exists  unitary  U  such  that 

[/«/li;  =  A^=diag(Aj,-.,A^) 

Since  A  =  A^ ,  we  know 


{U^AU)^  =  U^AU 
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Therefore  (A^)^  =  A^.  This  can  only  be  true  if  A?  G  R  for  all  i. 

Note  that  when  =  A  G  MmiC),  we  have  orthonormal  V  such  that 

V^AV  =  A^ 

A  =  A^  implies  that  (V'^AV)^  =  V^AV,  and  tiius  (A^)^  =  A^.  However,  we 
cannot  deduce  from  this  condition  that  G  R  for  any  i.  This  is  a  fundamental 
difference  between  complex  symmetric  matrices  and  Hermitian  matrices.  This 
says  you  cannot  automatically  assume  definiteness  for  a  complex  synunetric 
matrix.  □ 

Theorem  117  The  m  xm  Hermitian  matrix  A  is  positive  (negative)  (semi)- 
definite  if  and  only  if  the  matrix 

A^  =  diag(A?,---,A^) 

of  eigenvalues  also  is.  This  is  Stewart's  corollary  6.5.3  [259]. 

Proof.  Let  U  and  A^  be  the  matrices  of  eigenvectors  and  corresponding 
eigenvalues  of  A.  Ftecall  that  AU  =  A^.  Let  j/  =  f/x  for  all  x  G  C"*. 

A^  is  positive  (negative)  (semi-)definite  if  and  only  if  AU  is.  x^A^x  >  0 
implies  AUx  >  0  for  all  nonzero  x  in  C"*.  In  turn,  this  implies  y^ Ay  >  0 

for  all  nonzero  y  in  C"*.  Since  U  is  nonsingular,  y  =  Ux  is  a.  one-to-one 
mapping.  As  x  ranges  over  all  C”*,  then  y  also  ranges  over  all  C”*.  The 


inequality  can  he  any  of  >,>,<,  or  <  .  Tlius 

>  0  Vx  ^  0  in  C’"  implies  A^y  >  0  Vy  7^  0  in  O'" 

x^A^x  >  0  Vx  ^  0  in  C”‘  implies  y^A^y  >  0  Vy  7^  0  in  C"* 

x^A'^x  <  0  Vx  7^  0  in  C"*  implies  y^^ A^y  <  0  Vy  7^  0  in  C*" 

x^^A^x  <  0  Vx  ^  0  in  C"‘  implies  y^’ A^y  <  0  Vy  7^  0  in  C”* 

Note  that  =  |x,pA^.  This  means 

x^^A^x  >  0  Vx  7^  0  in  C"‘  implies  A?  >  0  Vz 

.r''  A^x  >  0  Vx  ^  0  in  C'"  implies  A^  >  0  V? 

x"  A^x  <  0  Vx  7^  0  in  C"*  implies  A?  <  0  V?' 

x''  A^x  <  0  Vx  7^  0  in  C'"  implies  Af  <  0  V/' 

□ 

Theorem  118  The  m  x  m  matrix  A~'  is  Hermitian  positive  (negative)  defi¬ 
nite  if  and  only  if  A  is  Hermitian  positive  (negative)  definite. 

Proof.  By  theorem  115,  let  A  =  HA^U'^  where  A^  =  cliag(A'f ,  •  •  • ,  A^)  is 
the  matrix  of  eigenvalues  with  corresponding  eigenvectors  in  matrix  U.  Let 
=  0  for  all  i.  Then 

yl-'  =  (f/A^f/")-'  =  [/-"A-^t/-' 

U  is  unitary.  Thus  =  U'’  and  H  =  which  implies  A~'  =  UA~^lH^, 


where 
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Note  that  ^  >  0  if  and  only  if  >  0,  and  ^  <  0  if  and  only  if  Xf  <  0. 
Therefore,  if  A?  >  0  for  all  i,  then  ^  >  0  for  all  i.  Thus  A  is  Hermitian 
positive  definite  if  and  only  if  A~^  is  Hermitian  positive  definite.  Similarly,  A 
is  Hermitian  negative  definite  if  and  only  if  A~^  is  Hermitian  negative  definite. 
□ 


M.7  Square  Root  Decomposition 

Theorem  119  (T/ery  Important!^  Let  A  he  a  non-negative  definite  complex 
m  X  m  matrix.  Then  there  exists  a  non-negative  definite  complex  m  x  in 
matrix,  written  as  A^l'^,  such  that  A  =  .  There  also  exists  a 

such  that  =  A.  These  are  Hermitian  Square  Root  matrices. 

Their  existence  provides  a  key  to  obtaining  numerically  robust  methods  such 
as  in  Kalman  square  root  filtering.  This  is  an  important  complexification  of 
Muirhead’s  theorem  A9.3  [187].  These  are  widely  known  results. 

Proof.  This  is  a  complexification  of  Muirhead’s  proof.  Let  H  he  &  unitary 
matrix  such  that  AH  =  D,  where 

£)  =  diag(A?,---,A2„) 

with  Aj,  --,A^  being  the  eigenvalues  of  A.  Since  A  is  nonnegative  definite, 
A^  >  0  for  f  =  1,  •  •  • ,  m.  Let 


D‘/2  =  diag(A,,---,A^)  =  (D'''2)" 


Then  Let  Then 
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=  HD^/^  =  HDH"  =  A 

I 

Therefore  {A^I'^){A^I^)»  =  A. 

Similarly,  there  exists  a  5*^^  such  that  —  A.  Let  {B^!^)^  = 

and  the  proof  is  complete.  If  A  is  positive  definite,  then  A^!"^  is  also 
positive  definite.  □ 

Theorem  120  Let  A  be  a  non-negative  definite  complex  mxm  matrix.  Then 
there  exists  a  non-negative  definite  complex  mxm  matrix,  written  as  such 
that  A  =  A^I'^A^I'^.  This  is  another  complexification  of  Muirhead’s  theorem 
A9.3  [187],  This  is  a  widely  known  result. 

Proof.  This  is  a  complexification  of  Muirhead’s  proof.  Let  i/  be  a  unitary 
matrix  such  that  AH  =  D,  where  D  =  diag(Ai,  •  •  • ,  A^)  with  AJ,  •  •  • ,  A^ 
being  the  eigenvalues  of  A.  Since  A  is  nonnegative  definite.  A?  >  0  for  i  = 
1 ,  •  •  • ,  m.  Let 

D'/2  =  diag(A,,..-,A„) 

Then  Di/2jr>i/2  =  £).  Let  A^^^  =  Then 

=  =  HDH^  =  A 


/ 
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Therefore  =  A. 

If  A  is  positive  definite,  then  A^^^  is  also  positive  definite. 

Examining  the  proof,  note  that  A^^^  =  {A^!^)^ .  □ 


Theorem  121  Let  A  be  an  m  x  m  non-negative  definite  complex  matrix  of 
rank  r.  Then  (i)  there  is  an  m  x  r  matrix  B  of  rank  r  such  that  A  = 
and  (ii)  there  is  an  m  x  m  nonsingular  matrix  C  such  that 


A  =  C 


Ir  0 
0  0 


c 


H 


This  is  a  complexification  of  Muirhead’s  theorem  Aff.f. 


Proof.  This  is  a  complexification  of  Muirhead’s  proof.  First,  we  prove  (i). 
Let  Di  =  diag(Ai,  •  •  • ,  A^)  where  AJ,  •  •  • ,  A^  are  the  nonzero  eigenvalues  of  A. 
Let  //  be  an  m  x  m  unitary  matrix  such  that 


=  diag(A2,---,A^0,---,0) 

Partition  H  as  H  =  [Hi,H2],  where  Hi  'xs  m  xr  and  H2  is  m  x  {m  —  r).  Then 


(  \ 

(  \ 

f  ^  \ 

Di  0 

Di  0 

A=H 

=  [Hu  H2] 

.  0  Oy 

.  0  Oy 

Let  =  diag(A],  •  •  • ,  Ar).  Then 

A  =  HiDUDfffi”  =  BB^ 


where  B  =  is  m  x  r  of  rank  r. 
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Now  we  prove  (ii).  Let  C  be  an  m  x  m  nonsingular  matrix  whose  first  r 
columns  are  the  columns  of  B  in  part  (i).  Then  C  =  [B,E\. 


□ 


(  Jr 


\ 

/  \ 

0 

C"  =  [5,  E] 

Ir  0 

^0  0^ 

=  BB^  =  A 


/ 


Corollary  35  Let  A  be  an  m  x  m  non-negative  definite  complex  matrix  of 
rank  r.  Then  (i)  there  is  an  m  x  r  matrix  B  of  rank  r  such  that  A  =  B^B, 
and  (ii)  there  is  an  m  x  m  nonsingular  matrix  C  such  that 

Ir  0 

0  0 

This  is  another  complexification  of  Muirhead’s  theorem  A9.4  [187]. 


{ 


V 


\ 


C 


} 


Proof.  This  is  a  complexification  of  Muirhead’s  proof.  First,  we  prove  (i). 
Let  Di  =  diag(Af ,  •  •  • ,  A^)  where  A^,  •  •  • ,  A^  are  the  nonzero  eigenvalues  of  A. 
Let  //  be  an  m  x  m  unitary  matrix  such  that 

//"A//  =  diag(A?,  --,A2,0,-  -,0) 


Partition  //  as  //  =  [Hi,  i/j],  where  Hi  is  m  x  r  and  //2  is  m  x  (m  —  r).  Then 


(  \ 

1  \ 

(  rj\ 

Di  0 

H^  =  [Hi,  H2] 

Di  0 

H» 

A  =  H 

.  0  0^ 

<  0  % 

Let  =  diag(Ai,  •  •  • ,  A,).  Then 

A  =  HiDy^{Dy^)"Hl^  = 
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where  B  =  {lliD\^^y^  is  r  x  m  of  rank  r. 

Now  we  prove  (ii).  Let  C  be  an  rn  x  in  nonsingular  matrix  whose  first  r 
rows  are  the  rows  of  B  in  part  (i).  'J'hen 


r  = 


V  / 


\ 

(  \ 

(  \ 

Ir  0 

C  =  (/i",  B"] 

B  0 

B 

^0  0^ 

.0  % 

□ 


=  [«",()] 


B 


B"B  =  A 


M.8  Unitary  Transformations 

Theorem  122  Suppose  that  A  and  B  are  roviplex  matriees  where  A  G  C*'*'"* 
and  B  G  C*^",  with  rn  <  n.  Then  AA^'  =  BB^'  if  and  only  if  there  is  an 
inxn  matrix  II  with  IIU^^  =  /„,  surh  that  All  =  B.  This  is  a  romplexijiration 
of  Muirhead's  theorem  A9.5  [11^7]. 

Proof.  'I'his  is  a  complexiru  ation  of  Miiirhead’s  proof.  Suppose  there  is  an 
rn  X  n  matrix  //  with  ll iB'  =  /„,  surh  tliat  All  =  B.  'Fheri 


BB"  =  (All)(All)"  =  Alin"  A"  =  AA" 
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Now,  suppose  that  AA^  =  .  Let  C  be  a  x  A:  nonsingular  matrix 

such  that 

It  0 

AA^  =  BB"  =  C  C" 

0  0 

where  rank(Ayl^)  =  r.  Matrix  C  exists  by  theorem  121.  Let 

D  =  C-^A,  E  =  C-^B  (M.2) 
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Let  E2  be  an  (n  —  r)  x  n  matrix  such  that 


is  an  n  X  n  unitary  matrix.  Let 

Di  0 

b  = 

02  bz 

where  jDj  is  (n  —  r)  x  m,  and  bz  is  (n  —  r)  x  (n  —  m)  such  that  P  is  an  n  x  n 
unitary  matrix.  Note  that  we  use  D\,  not  bi.  Then 


Notice  that 


since  b  is  unitary.  Define  Q  =  b^ E. 


(M.3) 

(M.4) 


Examine  Q. 
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b^ 

El 

D»Ei  + 

o 

E2 

b^E2 

E  is  unitary  which  implies  =  In-  D  is  unitary  implies  DD^  =  In  =  D. 
Then 

{b^E){b^EY'  =  b^EE^b  =  b^b  =  /„ 


Therefore  Q  is  unitary. 

Let  Q  be  partitioned  as 


(M.5) 


where  H  is  m  xn  and  f*  is  (n  —  m)  x  m.  Then  II II^  =  Im,  since  Q  is  unitary, 


Then 


QQ^  = 


Im  0 

0  In-m 


C-^B=  E 


D  0 


D  0 


Q 


II 

r 


=  DII  =  C-^AH 


This  implies  CC-'B  =  CC-'AH  ~ 
seeking.  □ 


by  equation  M.2 
by  equation  M.4 

by  equation  M.5 

by  equation  M.2 

B  =  AH  which  is  the  result  we  are 
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Theorem  123  Suppose  that  A  and  B  are  complex  matrices  where  A  € 
and  B  e  with  m  <  n.  Then  A^A  =  B^B  if  and  only  if  there  exists 

some  H  €  C"^"*  such  that  =  In  and  B  =  HA.  This  is  a  corollary  to  a 

complexification  of  Muirhead’s  theorem  A9.5  [187]. 

Proof.  This  is  a  slight  modification  of  a  complexification  to  Muirhead’s 
proof.  Suppose  there  is  some  H  G  such  that  H^H  =  In  and  B  =  HA. 

Then 

B^B  =  (HA)^(HA)  =  A^H^HA  =  A"  A 

Now,  suppose  A^A  ~  B^B.  Let  C  he  &  k  x  k  non-singular  matrix  such 
that 

It  0 

A^A  =  B^B  =  C"  C 

0  0 

where  vsink{A^ A)  =  r.  C  exists,  by  corollary  35.  Let 

D  =  AC-\  E  =  BC-^  (M.6) 


0  0 


0  0 
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Also 


D^D  = 


D»D^  D»D2 


=  (AC'-')"(AC-') 


=  C-^A^AC'^  = 


Ir  0 
0  0 


■^kxk 


D  = 


This  implies  Ei  =  D\  =  /r,  E2  =  0,  and  D2  —  0.  Thus  D  —  [i)i,0]  and 
E  =  [EuQ]. 

Let  E2  be  an  n  X  (n  —  r)  matrix  such  that  E  =  [£^i,  £2]  is  an  n  x  n  unitary 
matrix.  Let 

r 

Di  D2 
0  £>3 

where  ^2  is  m  x  (n  -  r),  and  £>3  is  (n  -  m)  x  (n  -  r)  such  that  £)  is  an  n  x  n 
unitary  matrix.  Then 

1  r 

Ir  0 
0  0 


E 


Ir  0 
0  0 


=  [Ei,E2] 


nxk 


=  [EuO]  =  E 


nxk 


and 


D 


Notice  that 


E  =  E 


-| 

r 

• 

* 

■ 

• 

- 

“ 

Ir 

0 

Dy 

D2 

Ir 

0 

£>i 

0 

D 

0 

0 

0 

D3 

0 

0 

* 

0 

0 

0 

- 

nxk 

- 

- 

■ 

nxk 

r  1 

• 

• 

* 

Ir  0 

Ir  0 

D 

D 

=  ED"  b 

=  ED" 

=  Q 

0  0 

0  0 

m 

0 

0 

•  • 

nxk 

- 

nxk 

/  m 

(M.7) 


since  D  is  unitary  and  Q  =  ED^ .  We  see  that 
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Q^Q  =  {ED'^fiEb")  =  DE^ED^  =  DD^  =  /„ 


Therefore  Q  is  unitary.  Partition  Q  as 


Q  =  [H,P] 


(M.8) 


where  H  \s  n  x  m  and  P  is  n  x  (;/,  —  rn).  Then  H  =  Im  since  Q  is  unitary 


and  Q^Q  = 


Im  0 


0  In-r 


.  Then 


PC-*  =  E 

D  D 

=  Q  =[H,P]  =HD 

0  0 

=  HAC-' 


by  equation  M.6 


by  equations  M.7 


and  M.8 


by  equation  M.6 


BCC~^  =  H AC~^C  =  B  =  HA  which  is  the  result  we  want 


Consider 


Ir  0 

rx(fc— r) 

rxr 


(n-r)xr  (n-r)x(/:-r) 


(n-m)xfc 


Note  that  there  are  two  different  matrices  of  the  form  .  One  has 

0  0 

dimensions  k  x  k  and  the  other  has  dimensions  n  x  k.  Dimensions  of  various 


matrices  arc  Qnxm  Pmxk,  and  P„xfe- 
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Theorem  124  Let  A  be  an  n  x  m  complex  matrix  of  rank(yl)  =  m  where 
n  >  m.  Then  (i)  A  can  be  written  as  A  =  HiB,  where  Hi  is  n  x  m  with 
Hi  Hi  =  Im  and  B  is  m  x  m  positive  definite,  and  (ii)  A  can  be  written  as 


A^H 


0 


B 


where  H  is  n  x  n  unitary  and  B  is  m  x  m  positive  definite.  This  is  a  com- 
plexification  of  Muirhead’s  theorem  A9.6  [187]. 


Proof.  This  is  a  complexification  of  Muirhead’s  proof,  (i)  Let  B  be  such 
that  B^ B  =  A^ A.  This  B  exists  by  theorem  119.  B  is  the  positive  definite 
Hermitian  square  root  of  {A^ A).  By  theorem  123,  A  can  be  written  as  A  = 
HiB  where  Hi  '\s  n  x  m  with  H^Hi  =  Im- 

(ii)  Let  Hi  be  the  matrix  in  (i)  such  that  A=  HiB  and  choose  an  nx(n— m) 
matrix  H2  so  that  H  =  [Hi,H2]  is  an  n  x  n  unitary  matrix.  Then 


Im 

Im 

A^HiB  =  [HuH2] 

0 

B  =  H 

0 

Theorem  125  Let  A  be  an  n  x  m  complex  matrix  of  rank(j4)  =  n  where 
m  >  n.  Then  (i)  A  can  be  written  as  ^4  =  BHi,  where  Hi  is  n  x  m  with 
HiH]^  =  In  and  B  is  n  x  n  positive  definite,  and  (ii)  A  can  be  written  as 
A  =  B[In,0W  where  H  is  m  x  m  unitary  and  B  is  n  x  n  positive  definite. 
This  is  a  corollary  to  a  complexification  of  Muirhead’s  theorem  A9.6  [187]. 


Proof.  This  is  a  modified  version  of  a  complexification  of  Muirhead’s  proof, 
(i)  Let  B  be  such  that  BB^  =  AA^ .  This  B  exists  by  theorem  119.  B  is  the 
positive  definite  Hermitian  square  root  of  {AA^).  By  theorem  122,  there  exists 
an  n  X  m  matrix  Hi  such  that  HiH^  =  /„  and  A  =  BHi. 

(ii)  Let  Hi  be  the  matrix  in  (i)  such  that  A  =  BHi  and  choose  an  (m  — 
n)  X  m  matrix  H2  so  that 

Hi 

H  = 

H2 

is  an  m  X  m  unitary  matrix.  Then 

Hi 

A  =  BHi  =  B[InM  =B[I„,Q]H 

H2 

□ 

M.9  Cholesky  or  Bartlett  Decomposition 

Theorem  126  ^ery  Important  j  If  A  is  an  m  x  m  positive  definite  complex 
matrix,  then  there  is  a  unique  m  x  m  upper  triangular  matrix  T  with  positive 
diagonal  elements  such  that  A  =  T^T.  This  is  known  as  Cholesky  or  Bartlett 
decomposition  (see  p.  134)[259].  This  is  a  complexification  of  Muirhead’s 
theorem  A9.1  [187]. 

Proof.  This  is  a  complexification  of  Muirhead’s  proof.  We  prove  it  by 
induction.  When  m  =  1  and  A  is  positive  definite,  then  A  >  0  and  there 
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exists  a  T  such  that  A  =  T^T  =  \/A\/A. 

Suppose  the  result  holds  for  positive  definite  matrices  of  size  m  —  1.  Par¬ 
tition  the  m  X  m  matrix  A  as 


H 

^‘12 


A  = 


[  Ai2  ^22  J 

where  An  is  (m  —  1)  x  (m  —  1).  Assume  there  exists  a  unique  (m  —  1)  x  (m  —  1) 
upper  triangular  matrix  Tu  with  positive  (therefore  real)  diagonal  elements 
such  that  All  =  T^Tn.  Suppose 


All  Ai2 

II 

TK  0" 

Tn  X 

* 

x^  y 

0  y 

L  J 

TKT^,  T»x 

x^Tii  x^x  -h 

where  x  is  (m  — 1)  x  1  and  y  €  R.  Thus  ajj  =  Solving  for  x,  0(2 Jn'  =  x^ 

which  implies  x  =  (ri'^*)^ai2.  Also,  022  =  x^x  -f  implies 


2  H  H  / nn—\\ff 

y  =  022  -  X  X  =  022  -  012^  11  11  )  ^12 


—  022  ~  o{^(T'/^Tii)  *Oi2  —  0.22  ~  a(^Ajj^Oi2 

where  012  is  a  column  vector  of  dimension  (m  —  1)  x  1.  Since  A  is  positive 
definite,  then 


O22  ~  012^11*012  >  0 
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The  unique  y  satisfying  this  is 

y  =  {a.22  -  a"2^u  >  0 

We  have  thus  found 

Tn  X 

T  = 

0  y 

such  that  A  =  T^T.  □ 

Corollary  36  If  A  is  an  m  x  m  positive  definite  complex  matrix,  then  there 
is  a  unique  m  x  m  lower  triangular  matrix  L  with  positive  diagonal  elements 
such  that  A  =  LL^ .  This  is  a  corollary  to  a  complexification  of  Muirhead's 
theorem  ^49. 7  [187]. 

Proof.  By  theorem  126,  there  is  a  unique  m  x  m  upper  triangular  matrix 
T  with  positive  diagonal  elements  such  that  A  =  T^T.  Let  L  =  .  Then 

A  =  LlJ' .  L  is  lower  triangular.  □ 

Theorem  127  If  A  is  a  compltx  n  x  rn  matrix  of  rank  m  where  n  >  m,  then 
A  ran  be  uniquely  written  as  A  —  U\T  where  //j  /.s  n  x  m  with  =  Im 

and  I  is  in  x  in  upper  triangular  with  positive  real  diagonal  elements.  This  is 
a  romple  xijie'alion  e>f  Muirhe  ad  's  the  ore  m  .  IM.S. 

Pr(M>f.  Ihis  is  a  complexification  of  Muirhead's  proof.  is  positive 

(h-finite  with  flimensions  rn  x  tn.  By  fh<*or<'m  126.  there  i.s  a  uni<{ue  in  x  ni  upper 
triangular  matrix  with  positive  real  diagonal  elements  such  that  =  T*’  T. 
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By  theorem  123,  there  exists  an  n  x  m  matrix  H\  such  that  and 

A  —  H\T.  Because  T  is  unique,  we  thus  know  Hi  is  unique.  Since  A  is 
m  X  m  and  positive  definite,  rank(7’)  =  m.  □ 

Corollary  37  If  A  is  a  complex  m  x  n  matrix  where  n  >  m,  then  A  can 
be  uniquely  written  as  A  =  LH\  and  rank(.<4)  =  m  where  Hi  is  m  x  n  with 
HiH^  =  Im  and  L  is  m  x  m  lower  triangular  with  positive  real  diagonal 
elements.  This  is  a  corollary  to  a  complexification  of  Muirhead’s  theorem 
/19.8.  This  is  also  Srivastava  lemma  1  [256]. 

Proof.  This  is  a  modified  complexification  of  Muirhead’s  proof.  AA^  is 
positive  definite  with  dimensions  m  xm.  By  corollary  36,  there  is  a  unique  lower 
triangular  matrix  L  with  positive  real  diagonal  elements,  such  that  AA^  = 
LL^ .  Let  L  in  this  proof  be  A  in  theorem  122  and  let  A  is  this  proof  be  B 
in  theorem  122.  Then  by  theorem  122,  there  exists  an  m  x  n  matrix  Hi  such 
that  HiH[^  =  Im  and  A  =  LHi.  Because  L  is  unique,  we  know  Hi  is  unique. 
Since  AA^  is  positive  definite,  rank(L)  =  m.  □ 

M.IO  Eigenvalues  of  Simply  Moa'fied  Matri¬ 
ces 


Lemma  54  IaI  \  •  .p  bf  tin  dyfurahus  of  real  or  complex  .tquarr 

matrix  .Y  of  dimi  nsion  p  x  p.  Then  the  matrix  (  /,,  —  .V)  has  eigenralues  {1  — 
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t]},i  =  1,  •  ■  •  ,p.  This  lemma  was  motivated  by  a  comment  by  Arnold  (p.  ^18, 
bottom)  [31]. 

Proof.  Let  det(X  —  t^I)  =  0.  Consider  the  matrix  (7  —  X). 

det[(7  -X)-  T^I]  =  0 
defines  the  eigenvalues  of  (7  —  X).  Then 

det[(7  -X)-  T^I]  =  det{-X  +  7  -  t^I) 

=  det[-X  +  (1  -  r2)7]  =  0 

Since  this  is  zero,  if  I  change  the  sign  I  still  have  zero.  Thus 

det[X  -  (1  -  T^)I]  =  0  =  det[-X  +  (1  -  t^)I] 

Comparing  with  det[X  —  t^I]  =  0,  we  note  that  1  —  or  =  1  —  t^. 

therefore  if  i  =  1,  •  •  •  ,p}  are  the  eigenvalues  of  X,  then  {l  —  t^,i  =  1,  •  •  •  ,p} 
are  the  eigenvalues  of  7  —  X.  □ 

Proposition  72  Let  {t^},i  =  l,---,p  be  the  eigenvalues  of  real  or  complex 
square  matrix  X  of  dimension  p  xp.  Then  the  matrix  (7p  +  X)  has  eigenvalues 
{1  +  t^},i  =  li  ■  •  •  iP-  This  lemma  was  motivated  by  a  comment  by  Arnold  (p. 
418,  bottom)  [31]. 

Proof.  Let  det(X  —  t^I)  =  0.  Consider  the  matrix  (7  +  A').  det[(7  +  A")  — 
r^/]  =  0  defines  the  eigenvalues  of  (  /  +  ,V).  Then 

det[(/  +  X)  -  r''7]  =  det(X  +  /  -  r*7)  =  det[X  -  (r"  -  1)/]  =  0 
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We  observe  that  which  implies  +  1.  3'herefore  if  {l^  +  l}f_, 

are  the  eigenvalues  of  /  +  X.  □ 

Proposition  73  Let  {if},*  =  1,  •••,;>  be  the  eigenvalues  of  real  or  complex 
square  matrix  X  of  dimension  pxp.  'I'hen  the  matrix  {aIp-\-X)  has  eigenvalues 
{«  +  if},*  =  1,  •  •  •  ,/).  This  lemma  was  motivated  by  a  comment  by  Arnold  (p. 
^18,  bottom)  [81]. 

Proof.  Let  (let(-Y  —  t^I)  =  0.  (.’onsider  the  matrix  (al  +  X). 

(let[(rt/  +  ,V)  —  T^/j  =  0 

(hdines  the  eig<>n values  of  («/  +  .V).  'riu'ii 

<let[(«/  +  X)  -  tU\  =  (let(X  +  a!  -  r^l)  =  (let[X  -  (r^  -  «)/]  =  0 

We  observe  that  r^  —  a  =  t^  which  implies  =  i^  +  a.  'Fherefore  if  {if  +  a}f=i 
are  th<‘  eigenvalues  of  alp  +  X.  □ 

Proposition  74  Let  {if},*  =  \,--,p  bi  the  I  igrnvalues  of  real  or  complex 
square  matrix  X  of  dimension  px p.  I'hi  n  the  matrix  (alp-\-h.X)  has  eigenvalues 

{«  +  /»if },?  ==  i,-  •  •  ,7* 

riiis  l(  mma  was  motivated  by  a  comnnnt  by  Arnold  (p.  ^IS,  bottom)  [81]. 
Proof.  Let  <let(.V  —  t^l)  —  ().  (!oiisi<l<‘r  the  matrix  (nl  +  /».V). 


(let[(n/  +  bX)  -  T^l\  =  I) 
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defines  the  eigenvalues  of  (a/  -t-  bX).  Then 


det[(a/  +  bX)  —  r^/]  =  det(6X  +  al  —  r^/) 

=  det[6A'  -  (r^  -  a)I]  =  fe^det  X  -  /p  =  0 

defines  the  eigenvalues  of  {al  +  bX)  in  terms  of  the  eigenvalues 
of  X.  Thus  T?  =  a  +  bt]  are  the  eigenvalues  of  (a/  +  bX).  □ 


Proposition  75  Let  X  have  eigenvalue  decomposition  ^  \\PkPk  •  Then  the 
inverse  of  the  matrix  {al  -f  bX)  is  given  by 


Proof.  By  proposition  74,  the  eigenvalues  of  {al  -)-  bX)  are  {a  + 


Since 


al  +  bX  =  Y,{a  +  bXl)PkPl^ 


Theorem  128  Let  X  have  eigenxmlue  decomposition  ^\PkPk  •  Then 

fc=i 


X  =  I  -  '£0  -  Xl)P,P," 


Proof. 


x  =  i-(i-X)  =  i-(£  PkH'  -  £  xlP.pA 

\*:=I  *=I  / 
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fc=i 

This  result  is  occasionally  disguised  where  has  the  form  then 

1  _  J_  =  ~  ^ 

ak  Ok 


Also,  for 


_ l_ 

1  + 


then 


1  -  A 


2 

k 


bk  _  1 

1  +  ^  +  1 


is  a  form  that  appears  in  literature.  This  comes  from  looking  at  the  inverse  of 


a  matrix 

Y  =  I+J2bkPkPk^  =  I-^B 

k=l 

□ 


Proposition  76  Let  A  =  .  Then  tr(A^)  =  13  where  the  {A?}J’_j  are  the 

i=\ 


eigenvalues  of  A. 


Proof.  Since  A  =  A  has  an  eigenvalue  decomposition  A  =  FA^F^ 
where  A^  is  the  diagonal  matrix  of  eigenvalues  and  F  €  f/(p)  is  the  matrix  of 
eigenvectors. 

^2  ^  tA^F^FA^F'^  =  FA^F" 


Therefore 

tr(A^)  =  trA^  =  X^Af 

»=i 

□ 


M.ll  Singular  Value  Decomposition 
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Theorem  129  (Singular  Value  Decomposition,  SVD).  Let  A  €  Qmxn 
rank(v4)  =  r.  'J'hen  A  can  be  decomposed  as  A  =  PAQ^  where  P  6  U{m), 
Q  £  U(7«),  and  A  is  an  rnxn  matrix  consisting  of  all  zeros  except  for  r  positive 
elements  (Ai,  •  ■  ■ ,  Ar)  on  the  main  diagonal.  Without  loss  of  generality,  assume 
n  >  rn.  This  is  a  widely  known  result. 


Proof.  (Cratofully  taken  from  R.  Rao,  pp.  42-43  [213]).  Recall  that  the 
matrix  =  /i  =  AA"  ha.s  the  eigenvalue  decomposition 

H  =  j2Xfl>.P," 

1=1 

Prom  C.  R.  Rao  [213],  let  Qi  =  X~'  A^' Pi.  Recall  that  {Pi)  are  orthonormal, 
which  means 


(  1 ,  '  =  i 

We  then  observe 


Q','Qk  =  X-'Xl'Pl'AA'n\  =  \ 


'A,-'/f 


i=l  / 


Pi 


1=1 

'Therefore  the  {Q,}  are  also  orthonormal. 

With  a  slight  rearrangement  of  Ql'  —  A,  we  note  that  X,Q'J  =  P-^  A. 

We  also  note  that  given  any  set  of  orthonormal  vectors,  we  can  complete  that 


set  to  form  an  orthonormal  basis  for  its  space.  Thus  we  can  find  in 

C"*  so  that 

i=\ 

This  allows  us  to  state 

A  =  M  =  (P,P"  +  -  •  + 

We  substitute  the  relationship  for  {Qi},  now  extended  to  a  set  of  size  m,  to 
obtain 

A  =  +  •  • .  +  PraP^A  =  AaPiQf  +  . . .  + 

where  A^+i  =  •  •  •  =  A^  =  0. 

We  know  {Qi}'^^  are  orthonormal.  When  n  >  m,  we  can  extend  this  set 
to  {Qi}?  to  form  an  orthonormal  basis  for  C”.  Thus  P  and  Q  are  unitary 
matrices  where  P  G  U(m)  and  Q  G  U(n).  We  can  rewrite  the  expansion  of  A 
into 

r  1  f 

A,  0 

A  =  iPu---,Pr,Pr+X,---,Pn)  '  '  =  P  AQ^ 

K  0 

0  •••  0  0 

where  P  G  A  G  Q  G  C"’'".  □ 


Appendix  N 
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TRIGONOMETRY  OF  COMPLEX 
MATRICES 

This  was  written  just  for  the  fun  of  it.  It  was  motivated  by  looking  at  eigen¬ 
value  decompositions  while  playing  with  the  zonal  polynomial  questions,  and 
recalling  the  Cayley- Hamilton  theorem  that  says  a  matrix  satisfies  its  own 
characteristic  equation.  This  led  to  looking  at  other  functions  of  a  matrix. 
The  work  presented  here  has  potential  application  when  the  CS  decomposi¬ 
tion  is  used,  such  as  using  the  matrices  C  and  S  in  section  6.3.4  of  Tague’s 
thesis  [263].  Much  of  the  early  part  of  this  chapter  is  a  complexification  of 
material  from  the  fine  work  by  Curtis  (pp.  45  ff)  [64].  In  this  case,  “Curtis” 
is  the  author’s  family  name,  not  his  Christian  name. 


N.l  Matrix  Exponential  and  Logarithm  Prop¬ 
erties 


Definition  80  Exponential  of  a  Matrix.  Let  A  be  a  complex  nxn  matrix  and 
define 


=  I  +  A  +  ~A^  ■ 


where  This  series  converges  if  each  of  the  complex  number 

series 

converges.  This  defines  a  mapping 

exp  :  A/„(C)  A/„(C)  =  6’L(n,  C) 

This  definition  is  from  Curtis  [64]- 

Proposition  77  For  any  complex  n  x  n  matrix  A,  the  series 

I  +  A  +  ^A^+y^+--- 

converges.  This  is  a  complexification  of  Curtis’  proposition  1  [64]- 

Proof.  Let  m  be  the  largest  |aij|  in  A.  Then  the  element  of  largest  mag¬ 
nitude  in  the  first  term  is  1.  The  element  of  largest  magnitude  in  the  second 
term  is  m.  The  element  of  largest  magnitude  in  the  third  term  is  <  The 
element  of  largest  magnitude  in  the  fourth  term  is  <  and  so  on.  Any  ij 
sequence  is  dominated  by 

nmf 

1)!  ’■■■ 

Applying  the  ratio  test  to  this  sequence  gives 

(A*— 1)!  nm 
k\  1 

Since  m  and  n  are  fixed,  the  ratio  goes  to  zero  as  k  oo,  proving  absolute 


convergence.  □ 
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Proposition  78  IfO  is  the  zero  matrix,  then  e“  =  I. 

Proposition  79  If  A  is  an  n  x  n  complex  Hermitian  matrix  {A  =  A^),  then 
A"  is  also  an  n  X  n  complex  Hermitian  matiix. 

Proof.  Since  A  is  Hermitian,  then  A  =  by  theorem  119.  From  this 
we  observe 

A"  =  -  •  •  (B"B)  =  (A")"  =  (A")" 


□ 

Proposition  80  For  any  n  x  n  complex  Hermitian  matrix  A,  then  is  also 
an  n  X  n  complex  Hermitian  matrix. 

Proof. 

is  a  linear  combination  of  n  x  n  complex  Hermitian  matrices.  Therefore 
is  an  n  X  n  complex  Hermitian  matrix.  Therefore  is  an  n  x  n  complex 
Hermitian  matrix.  □ 

Lemma  55  If  the  matrices  A  and  B  commute,  then 


This  is  Curtis’  proposition  2  [64]- 
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Proof. 

=  I  +  iA  +  B)  +  ^{A  +  Bf  +  j^{A  +  By  +  --- 

=  I  +  A  +  B  +  ^A^~\-^AB  +  ~BA  +  ^B^ 

+Ia^  +  Ia^B  +  \aBA  +  \aB‘^  +  \bA^  +  \bAB  +  ij5M  +  +  •  •  • 

666  666  66 

Note  that  even  if  AB  ^  BA.  When  AB  =  BA  then  the  above 

simplifies  to 

e^+s  =  /  +  yn-  B  +  i ^2  +  AB  +  ^5^ 

^\a^  +  \a^b^\ab^  +  \b^^--- 
6  2  2  6 

Continuing, 

=  {1  + A  +  Ia^  +  Ia^  A  •■■){! +  B+Ib^  +  Ib^  +  ---) 

2  b  2  b 

=  lAA  +  B  +  ]-A'^  +  AB  +  ]-B^ 

a\a^a\a^ba\ab'^a\b'^^--- 
6  2  2  6 

When  AB  =  BA  then 

e^e®  = 


□ 

Proposition  81  is  nonsingular.  This  is  a  complexification  of  corollary  1 
to  Curtis’  proposition  2  [64]- 


f 
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Proof.  Let  A  €  Mn(C).  A  and  —A  commute  with  respect  to  multiplication. 
By  lemma  55, 

7  =  e°  = 

and  thus 

det(/)  =  1  =  (det  e'^)(det 

which  implies  det(e^)  ^  0.  Therefore  is  nonsingular.  □ 

Proposition  82  If  A  —  —A^  (A  is  skew-Hermitian),  then  is  unitary. 
This  is  a  complexification  of  Curtis’  proposition  3  [64]. 

Proof. 

7  =  e°  =  =  (e^)(e"‘)^ 

Thus  is  unitary  when  A  is  skew-Hermitian.  □ 

Theorem  130  If  A,  B  are  nxn  complex  matrices  and  B  is  nonsingular,  then 

eBAB-'  ^  Be^B-' 

and 

det  '  =  det  e"' 

This  is  Curtis’  proposition  4  [64j. 

Proof.  This  proof  is  by  Curtis  [64]. 


(BAB-'y  =  iBAB-^)[BAB-^)--{BAB-^)  =  BA’^B'^ 


^BAB  ‘^74.  bAB-^  +  +  ^Y^AB-^f  + 

^  •  O  • 

=  /  +  BAB-^  +  ■'■  "^  ■  ■  ■ 

=  B(/  +  A  +  ^  ^  +  •  •  ■)B-^  =  Be^B-^ 


dete®^^  "  =  det(5e^5  *)  =  (det  5)(dete"^)(det  ’)  =  dete 


„A  D-l 


Corollary  38  If  A,  B  are  n  x  n  complex  matrices  and  B  is  unitary,  then 


det  =  det  e'' 


and 


oBab»  ^  Be^B^ 


Proof.  Substitute  B^  =  B  ’  into  the  proof  of  theorem  130.  □ 


Definition  81  Let  X  €  M„(C).  Define 


log(A-)  =  (.V  -  /)  -  i(.V  -  /)»  +  i(A'  -  If -'-(X-ir  + 


This  is  a  complexification  of  a  definition  given  by  Curtis,  p.  49  [64]- 


Proposition  83  Let  X  €  Mn{0).  Then  \og{X)  converges  when  the  magnitude 
of  the  largest  element  of  X  —  I  is  less  than  X  Note:  X  can  not  be  the  zero 
matrix.  This  is  a  complexification  of  proposition  5  of  Curtis  [64]. 
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Theorem  131  In  Mn{C),  let  U  he  a  neighborhood  of  I  in  which  log  is  de¬ 
fined,  and  let  V  be  a  neighborhood  of  zero  such  that  exp(V')  is  contained  in  U. 
Then  (i)  for  A  ^  V,  loge"^  =  A,  and  (ii)  for  X  G  U,  =  X.  This  is  a 

complexification  of  proposition  6  of  Curtis  [64]- 


Proof.  This  proof  is  by  Curtis  [64],  except  that  the  matrices  are  now 
complex  valued  rather  than  real- valued. 

(\)  A  ^  V  implies  e^  e  U  hy  hypothesis.  This  implies  that  loge^  exists. 
So, 
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+ 


+ ... 


=  >l  +  0  +  0  +  0  +  ---  =  A 


(ii) 

log(jif)  =  (a:  -  /)  -  1(a:  - +  1(A-  -/)=-.. . 

which  implies 

=  |/ +  (A- -/)-i(X-;)*  +  ...}  +  i{(.V-/)-l(Af -/)'+  ■•■}“ 
=  X-i(X-  lf+\(X  -  lf+  {i{Ar  -  /)=  -  i(A-  -  If  +  -  /)=}  + .  • . 

=  a:  +  o  +  o  +  --  -  =  x 

□ 

Corollary  39  Let  U  be  a  neighborhood  of  I  in  M„(C)  in  which  log  is  defined. 
Let  X,Y  eU.  Let\ogX  andlogY  commute.  Then 

log(XK)  =  logX  +  logF 

This  is  a  complexification  of  part  I  of  Curtis’  proposition  7  [64]- 

Proof.  This  is  essentially  the  proof  by  Curtis  [64]  where  the  matrices  are 
now  understood  to  be  complex,  1  =  XY  by  theorem  131.  XY  = 

giog  ■’f  giog  y  ^  also  by  theorem  131.  Since  logX  and  logT  commute, 

glogXglogy  _  plogX+IogV 
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by  lemma  55.  □ 

Corollary  40  Let  unitary  X  be  in  a  neighborhood  of  I  in  Mn(C)  in  which  log 
is  defined.  Then  logX  is  skew-Hermitian.  This  is  a  complexification  of  part  2 
of  Curtis’  proposition  7  [64]- 

Proof.  This  is  a  complexification  of  the  proof  by  Curtis  [64],  Since  X  is 
unitary,  X^ X  =  XX^  =  I.  Thus  X  and  X^  commute,  which  implies  logX 
and  logX^  commute.  Then 

0  =  log(/)  =  \og{XX”)  =  logX  +  log  jv:"  =  log  X  +  (log  xf 

This  implies  logX  =  — (logA")^,  showing  that  logA”  is  skew-Hermitian. 

Remark.  This  remark  is  supplied  by  me.  The  matrix  functions  exp  and 
log  are  not  simple  generalizations  of  the  univariate  case.  For  example, 

even  when  exists.  This  is  easy  to  see  since  is  an  n  x  n  matrix,  while 
^  log  Y  is  an  x  matrix.  Let  us  see  what  log  Y  is. 

exp(log  Y)  =  Y  C  Neighborhood(/) 
log(expX)  =  X  C  Neighborhood(O) 

^  log(exp  X)  =  ^  =  E„El  ^  /„2. 

Treating  log(expX)  as  a  composition  of  functions  logoexp(A'^),  we  get 
^(log(expAj)  =  [^logFjl^expA]  =  EnE^ 
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Thus 


logy  =  exp xr 


when  this  inverse  exists.  □ 


N.2  Matrix  COS  and  SIN  Functions 


Work  in  this  section  is  supplied  by  me. 


Definition  82  Define  the  matrix  cosine  function  as 


CiX)  =  ^[exp(?X)  +  exp(-2A:)] 


and  the  matrix  sine  function  as 


6'(X)  =  ^[exp(2;V')  -  exp(-2A')] 


Thus 


exp(iX)  =  C{X)  +  25(X) 


Note  that  C'(O)  =  I  and  5(0)  =  0.  Unlike  the  univariate  case, 


d 


^C(X)  ^  -S{X) 


and 


dX 


S{X)  ^  C{X) 


(N.l) 


(N.2) 


(N.3) 


(N.4) 


(N.5) 


(N.6) 


The  derivative  matrices  are  x  whereas  C{X)  and  S{X)  are  n  x  n  ma¬ 


trices. 
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Proposition  84  (a) 


S(A)  =  A- +  + 


and  (b) 


C(A)  =  /-iA=  +  l/l<-i^»  + 


Proof.  For  part  (a); 


1  OO  1  1  OO  1 


Note  that 


l-(-lf  = 


0,  when  k  s  even 
2,  when  k  is  odd 


Then 


S(  A)  =  |(i/l  -  ^  +  ■  •  •)  - /I  -  i 


'A=  +  iA*  + 


OO 


(-if+M 


2k-l 


For  part  (b): 


1  1  1 
C(A)  =  iCe-  +  e--)  =  i  E 


1  °°  1 

=  2  E  M  J1  +  (-1)1,  (»1)‘ 

^  k=o 


0,  k  odd 


2,  k  even 
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□ 


Proposition  85  Let  X  G  M„(C)  be  an  n  x  n  complex  matrix,  and  let  C{X) 
and  S{X)  be  defined  as  in  definition  82.  Then 

C\X)-\-S\X)  =  In 

Proof.  Note  that  X  and  —X  commute  under  matrix  multiplication,  which 
allows  us  to  use  lemma  55. 

C^{X)  +  5^(.Y)  =  ^  [exp(iA’)  +  exp(— LY)]  ^  [exp(iA'')  +  exp(— iX)] 
[exp(zX)  —  exp(— i.Y)]  ^  [exp(?X)  —  exp(— LY)] 

2i  2i 

=  ^  [exp(i2X)  +  2exp(tX  —  iX)  +  exp(— i2x)] 

—  ^  [exp(z2X)  —  2exp(*.Y  —  iX)  +  exp(— i2x)] 

=  exp(?:0)  =  In 

□ 


i 
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Lemma  56  Let  D  =  diag(<ii,  <^2,  •  •  • » d„)  be  an  nxn  diagonal  complex  matrix. 
Then 

exp(rfi) 

exp(d„) 

Proof. 

eD  =  I+D  +  ~D-^  +  ~D^  +  --- 
=  diag  {  (1+  d.i  +  +  •  •  •)  where  i  =  1,  •  •  • ,  n| 

e''* 

□ 

Theorem  132  Let  a  be  a  complex  scalar.  Then  e“^  =  e“/. 

Proof.  This  follows  directly  from  lemma  56.  □ 

Theorem  133  Let  a  be  a  complex  scalar  and  D  =  diag(di,  •  •  • ,  d„)  be  an  nxn 
diagonal  complex  matrix.  Then 


Proof.  This  follows  directly  from  lemma  56.  □ 


A 


Theorem  134  Let  a  be  a  complex  scalar.  Then 


C{al)  —  [cos(a)]/ 

and 

S{al)  =  [sin(a)]/ 

where  cos  and  sin  (with  lower  case  c  and  s)  are  the  usual  scalar  trigonometric 
functions. 


Proof. 

C{al)  =  =  i(e‘“/  +  6*“*/) 

=  \{e“‘  +  e-“)/  =  Icos(a)); 

Similarly, 

S(aJ)  =  [sin(a)]/ 

□ 


Corollary  41  C(f /)  =  0  and  5(f /)  =  I. 

Proof.  This  follows  directly  from  theorem  134.  □ 

Theorem  135  Let  A  and  B  commute  under  multiplication  and  A,B^  M„(C). 
Then 

5(A  +  B)  =  S{A)C{B)  +  C(A)5(B) 

S{A  -B)  =  S{A)C{B)  -  C{A)S{B) 


C{A  +  B)  =  C{A)C{B)  -  S{A)S{B) 


849 


C{A  -B)  =  C{A)CiB)  +  S{A)S{B) 


Proof: 


5(A)C(5)  +  C'(A)5(B) 


=  — 

+  A  +  e-‘^e'®  -  e-‘^e“‘®) 

4i  ^  ' 

=  j,  (2e‘^e‘®  -  2e-'^e-®)  =  (e‘^e‘®  -  e-’^e"’®) 

42  '  ^  22  '' 


Invoking  lemma  55,  we  get 


1  (e.(>i+B)  _  ^S(A  +  B) 


The  other  identities  are  proven  in  a  similar  fashion.  □ 


Theorem  136  1/  A  and  B  commute,  then  5{/l)  and  C{B)  commute.  That 


is,  if  AB  =  BA,  then 


S(A)C(B)  =  C(B)S(A) 


Proof.  From  theorem  135, 


5(yl)C(5)  =  ^(e’^e*®  +  e‘^e-'®  -  e-^e*®  -  e-*^e-‘®) 
42 


Now  invoke  e'^^e’®  =  e'®e''^  from  lemma  55.  This  gives  us 


1  (e'®e'^  +  e-‘®e’^  -  e’®e-"‘  -  e-'®e-^') 
4».  '  / 
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=  1  (e<»  +  e-«)  i  (e''>  -  e-'')  =  C(B}S(A) 

□ 


Corollary  42  5(>1)  and  C{A)  commute. 


Proof.  A  commutes  with  itself.  Let  B  =  A  m  theorem  136.  □ 

Theorem  137 

(a)  S(2A)  =  2S(A)CiA) 

(b)  C{2A)  =  [C{A)Y  -  [5(A)]2  =  I  -  2[S{A)Y  =  2[C{A)Y  -  I 

(c)  S{3A)  =  35(A)  -  4[5(A)]3 

(d)  C(3A)  =  4[C(A)P-3C(A) 

(e)  [S{A)Y=\[I-C{2A)] 

(f)  [CiA)]^  =  l[I  +  C{2A)] 

Proof.  The  proof  of  these  is  strictly  mundane,  made  possible  by  theorem 
136.  Note  that  A  commutes  with  itself.  For  example,  look  at  (c). 

5(3A)  =  5(A  +  2A)  =  5(A)C(2A)  -f  C(A)5(2A) 

=  5(A)[C2(A)  -  52(A)]  +  C(A)[5(A)C(A)  +  C(A)5(A)] 

=  5(A)C'2(A)  -  53(A)  +  5(A)C'2(A)  +  S(A)C^(A) 

=  35(A)C2(A)  -  53(A)  =  35(A)[/  -  52(A)]  -  53(A) 

=  35(A)  -  353(A)  -  53(A)  =  .35(A)  -  453(A) 


Proof  of  other  parts  follows  similar  mechanics.  □ 


Theorem  138  If  A  and  B  commute,  then 


(a)  S{A)  +  S{B)  =  2S[\(A  +  B)]C[\{A  -  B)] 

(b)  S{A)  -  S{B)  =  2C[l{A  +  B)]S[\{A  -  B)] 

(c)  C{A)  +  C{B)  =  2C[\{A  +  B)]C[\{A  -  B)] 

(d)  CiA)  -  C{B)  =  2S[^^{A  +  B)]S[l{B  -  .1)] 

(e)  S{A)C(B)=^^[S{A  +  B)  +  S(A-B)] 

(f)  C{A)S{B)  =  ^^[SiA  +  B)-SiA-B)] 

(g)  C{A)C{B)  =  \[C{A  +  B)  +  C{A- B)] 

(h)  5(yl)5(B)  =  |[C(/l-B)-C(/l  +  5)] 
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Proof.  The  proof  is  merely  tedious  algebra.  Commutativity  is  required  to 
invoke  lemma  55. 

(a)  2S11(A+B)|C(1(/1-B)1 

=  2[S(i/l)C(lB)  +  C(!/l)5(lB)| 

=  2[S(\A)C('iA)C-‘(\B)  +  S^(iA)S(iS)C(iB) 
+C^(iA)S(iB)C(iB)  +  S(  J/l)C(l/l)52(iB)l 
=  2l(S^(iA)  +  C^(iA))SQB)C(iB) 
+(S^(lfl)+C“(lB))5(l/l)C(l/))l 
=  2|S(1B)C(1B)  +  S(1A)C(1/1)1 
=  2S(jA)C(l/l)  +  2S(lB)C(|B) 

=  S(A)  +  S(fl) 

(b) -(d)  Proof  is  similar  to  (a). 

(e)  l[5(yl  +  5)  +  5(.4-.e)] 

i[5(A)C(5)  +  C(A)S(B)  +  S(A)C(B)  -  C(A)5(5)] 

=  i[2S(A)C(B)]  =  S(A)C(B) 

(f) -(h)  Proof  is  similar  to  (e). 

□ 

Theorem  139 


[C{A)  +  i5(A)]"  =  C{nA)  +  iS{nA) 


853 


Proof. 

[C(A)  +  iS(/l))"  =  (ie"'  +  +  \e“  -  ie'"')” 

=  [e‘^]"  =  e'"^  =  C{nA)  +  i5(nA) 

□ 

Theorem  140  Let  X,B  6  Af„(C)  and  let  B  be  nonsingular.  Then 

BC{X)B-^  =  C{BXB-^) 

and 

BS{X)B-^  =  S{BXB-^) 

Proof.  Invoke  theorem  130.  To  prove  the  first  equality,  we  begin 

BC{X  =  \[Bexp{iX)B-'^  +  B exp{-iX)B-^] 

=  ^[exp{iBXB~^)  +  exp{—iBX  B~^)] 

=  c{BXB-^) 

Similarly,  the  second  equality  is  shown 

BSiX)B-^  =  A[Bexp{iX)B-^  -  B exp{-iX)B-^] 

=  ^[exp(fBA'F~’)  —  exp(— 

=  SiBXB-^) 

□ 

Corollary  43  Let  X.,B  G  A/„(C)  and  let  B  be  unitary.  Then 


BC(X)B^  =  C(BXBf^) 
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and 

BS{X)B^  =  S{BXB^) 

Proof.  Since  B  is  unitary,  then  B~^  =  B^ .  With  this  substitution,  the 
proof  follows  that  of  theorem  140.  □ 

N.3  Relating  Trace  and  Determinant 

This  is  a  particularly  nice  result  because  the  trace  and  determinant  operators 
are  functions  of  only  the  unordered  eigenvalues  of  ^4. 

Theorem  141  Let  A  €  M„(C).  Then 

exp[tr(i4)]  =  det[exp(i4)] 

This  is  taken  from  Curtis  (p.  55)  [64]. 

Proof.  Let  A  be  diagonalized  by  the  similarity  transformation  D  =  BAB~^ 
where  D  is  a  diagonal  matrix.  Then 

by  theorem  130.  We  note 

dete^  =  (det5)(dete^)(detB-*)  =  det(Be^B-^) 

=  det(e®'^®  ')  =  dete® 
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Thus  D  =  '^  di  and 

i=l 

n 

dete®  =  JJ  ^dete'^’j 
i=l 

since  the  rf,  commute.  Using  the  definition  for  matrix  exponential, 


which  implies  dete'^'  =  e^*.  So, 


856 


dete^  =  n 

i=i  Vi=i  / 

=  exp[tr(v4jB'^5)]  =  exp(tr  A) 

Therefore 

exp[tr  A]  =  det[exp  A] 


when  A  can  be  diagonalized  by  a  similarity  transformation.  □ 


Appendix  O 


857 


USEFUL  IDENTITIES 

Identities  which  have  been  useful  in  the  development  of  this  work  are  recorded 
here.  Most  of  these  are  common  identities  recorded  here  for  convenience’s  sake. 
There  are,  however,  some  nontrivial  ones  near  the  end  of  this  short  section. 
Lack  of  citing  a  reference  on  the  simpler  identities  merely  indicates  they  are 
very  easy  ones  which  I  did  myself  and  did  not  think  important  enough  to  find 
out  who  else  has  done  them.  I  do  not  claim  these  as  new  contributions. 

0.1  Sums 

Proposition  86 

(p  +  6)(p-6+1) 

k=b  2 

Proof.  This  is  a  generalization  of 

"  n(n  +  l) 

t  2 

p  p-b+l 

=  (p- 6+ 1)(6- 1)  +  ^  k 
k=b  k=l 

=  (p-6+l)(6-l)  +  i(p-fe+l)(p-6  +  2) 

=  {p-b+l)[b-l  +  ^{p-b  +  2)] 

=  \{p-^+  1)(2^>  -  2  +  p -  6  +  2)  =  ^(p  +  b){p  -6+1) 
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Proposition  87 


6  6  a— 1 


H  ^  -  Hi  =  1)  -  (a  -  l)a] 


k—a  «=1  i=l 


Useful  special  cases: 


'’E2i  =  (p-l)p 

i=l 

E  2^  =  p(p  +  1)  -  2 
«=2 

"e  2i  =  (p-l)p-2 

t=2 


0.2  Combinatorics 


Proposition  88 


(2m  —  l)(2m  —  3)(2m  —  5)  •  •  •  3  •  1  = 


(2m)! 

2”‘m! 


For  EVEN  m: 


For  ODD  m: 


(2m  -  l)(2m  -  3)(2m  -  5)  •  •  •  (m  +  1) 

_  (2m)!(f)!  _  (f )!  (2m\ 


—  2"»/2(m!P  ■“  2"*/2  \  m  / 

t2TnVf"*~*  V 

(2m  -  l)(2m  -  3)(2m  -  5)  •  •  •  m  = 

^  i^y-  (  2m  \ 

2(m— 1)/2  Ynj  —  W 


0.3  Classical  Distribution  Properties 
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Theorem  142  Chi-Square  Distribution. 

The  distribution  shows  up  in  the  evaluation  of  some  important  special 
cases  in  the  properties  of  the  Wishart  distribution,  both  in  the  complex  and 
real  variables  cases.  The  x^  distribution  in  this  thesis  refers  to  the  usual 
real- variables  case  of  random  variables  for  x^-  Many  texts  discuss  the  ganuna 
distribution,  and  then  point  out  that  the  x^  is  merely  a  special  case.  Although 
true,  it  is  an  important  enough  special  case  to  have  a  life  of  its  own. 

There  are  some  properties  we  need  in  this  work,  and  they  are  tabulated 
here.  These  are  copied  from  Canavos  (p.  149)  [50].  They  can  be  found  in 
many  texts.  Let  x  ~  Xn(®)’  Then  the  following  results  are  true. 

S{x}  =  n 
var(x)  =  2n 
skewness(x)  = 
kurtosis(x)  =  3 

mgf  TTix  (t)  =  (1  —  for  0  <  f  ^ 
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^{x-}  =  2-r(f +  m)  /r(f) 

f{x2}  =  4r(^)  /r(^) 

S{x^}  =  4k{k  4- 1)  for  n  =  2k 

£{y^)  =  k{k  +  1)  for  x  —  2y  and  n  =  2k 

—  (n-2){n-4) 


0.4  Functions  of  a  Hermit ian  Positive  Defi¬ 
nite  Matrix 

Lemma  57  Let  U  be  Hermitian  positive  definite.  Let 

Then  h{U)  is  maximized  when  U  =  This  is  a  complexification  of  Arnold’s 
lemma  A. If  [31]. 

Proof.  This  is  a  complexification  and  expansion  of  Arnold’s  proof.  Let  U 
be  a  p  X  p  Hermitian  positive  definite  matrix.  By  theorem  118,  U~^  is  also 
Hermitian  positive  definite.  By  theorem  115,  let  U~^  have  eigenvalues 


=  diag{<?,---,f^) 
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with  eigenvector  matrix  F.  By  Stewart  corollary  6.5.3  [259]  we  know  t}  >  0  for 
all  i  G  [l,p].  Then  by  theorem  114  and  theorem  113  we  have 

\i=l  /  «=1 

Now,  find  the  {t\,  ■  ■  •  ,t^)  that  maximizes  •  ■  • ,  over  the  set  where  t]  >  0, 
and  then  find  the  matrix  associated  with  those  eigenvalues. 


where 


=  n  >  0 


because  >  0  for  all  i.  Continuing, 


c  =  {nt-^  -  l) 


From  this  we  see  that 


if  and  only  if  nt~^  —  1=0,  which  implies  tf  =  n.  Then 

...  ^  -  nf*"e'‘?c 

dtr  '  ’  ' 

=  +  (nf-2  -  l)  c 

=  —  n)t~'^  —  2nt~^  4-  l] 

Evaluating  the  second  partial  derivative  at  <^  =  n  gives  us 


1  -  -  -  2  +  ll  n"e-"c  =  -n"-‘e-"c  < 
n  J 
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Thus  tf  =  n  gives  us  a  maximum.  Therefore 

u-^  =  r(n/)r"  =  nl 

which  implies  t/  =  ^/.  □ 

Theorem  143  Let  U  and  A  be  Hermitian  positive  definite  p  x  p  complex 
matrices.  Define 

f{U)  =  [detf/]-"exp[-tr(f/-M)] 

Then  f{U)  is  maximized  when  U  =  ^A.  This  is  a  complexification  of  theorem 
A. 15  of  Arnold  [31]. 

Proof.  This  is  a  complexification  of  Arnold’s  proof.  By  theorem  120,  there 
exists  A^!"^  such  that  A  =  Then 

f{U)  =  [detf/]-"exp[-tr((7-M)]  =  [det  t/]""  exp[- tr(f/-‘ 

=  [dett/]-"exp[-tr(ATt/-M2)] 
[detA]-’'[det(A-‘/^C/A-‘/2)]~”exp(-  ir[{A-^I^U A-^'^)-^)] 

=  [detA]-”/i(A-’/=*t/A-‘/2) 

where  h  is  defined  in  lemma  57.  Thus  f{U)  is  maximized  when 

A-^IHIA-^I'^  =  -I 
n 


or  equivalently,  when  U  =  iA.  □ 
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0.5  Properties  of  Unitarily  Invariant  Func¬ 


tions 


Lemma  58  Suppose  that  C  is  a  fixed  p  x  p  complex  matrix  and  Z  is  a  p  x  p 
complex  random  matrix.  If  a  function 


g{C)  =  e{eiv{CZ)] 


satisfies 


g{C)  =  £{eiT{CZ))  =  E{eix{CU”  ZU))  =  g{UCU^) 


for  all  p  X  p  unitary  matrices  U,  then 


a 

^Ik}  —  9ij,kl  ”  h\6ij6lfi  -|-  b2hilhjk 

-  aUijOtyki  c=o 

This  is  Tague ’s  complexification  [264]  of  Olkin  and  Rubin  lemma  1  [199].  This 
lemma  is  used  in  the  development  of  theory  resulting  in  a  beamforming  example 
by  Tague  for  computing  the  signaTto-noise  ratio. 


Proof.  Let  U  =  [f/i,  U2,  •  •  • ,  Up]  be  a  unitary  matrix.  Then 


tT{CU”ZU)  =  EE  C„U^ZUi 

»=1  j=l 

Note  that  the  order  of  subscripts  of  Cij  are  the  opposite  of  ZU,.  Expanding 
the  assumed  functional  form  and  taking  derivatives,  we  obtain 

L  ZU)f{Z)idZ) 

OUijOtykl  Jz>0  (7=0 
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=  /  ZUiU[^  ZUketr{CU^ZU)fiZ){dZ) 

JZ>Q 


C=0 


This  shows  that 

giSM=S{u»ZUiU»ZUk] 


for  all  unitary  matrices  U. 
Let  U  —  I.  Then 


S  {UfZU.UfZU,}  =  S  {e»Ze^e»Zej]  =  S  {ZjjZjj}  =  £  {4  }  = 

By  hypothesis,  g  is  invariant  for  all  unitary  U.  Therefore,  g  is  unchanged  if  we 
let  Ui  ~  €j  and  Uj  =  Cj.  Then 

gii.,i  =  e{{z„y}  =  f  {f/fzt/.t/fzc/,} 

=  e{e«Zc,efZei}  =  S{(Zii)'‘} 

where  1  <  i,j  <  p.  If  a  column  of  unitary  U  is  multiplied  by  a  complex  number 
with  unit  magnitude,  then  U  remains  unitary.  If  we  exchange  columns  of 
unitary  U,  the  new  matrix  is  unitary.  By  picking  special  cases  of  £/,  we  can 
show  that  most  second  order  moments  are  zero. 

Let  U  =  I.  Then 

Now,  let  Ui  =  ti  and  Uj  =  ey\/^.  Then 


Similarly, 


9ii,ij  —  9ii,jk  —  9ij,ik  —  9ij,kl  —  0 

The  only  nonzero  terms  are  9ij,jii9ii,jj-,  and  fif,,.,,,  1  <  i,j  <  p. 

We  relate  these  nonzero  terms  by  applying  Olkin  and  Rubin’s  trick  to 
evaluate 

gii,i,  =  o  =  s{u!^zUiU^zUi}=  UaiU;^u^kUh9c.0,-y5 

a,0,y,S 

where  is  the  complex  conjugate  of  the  complex  scalar  Upj.  Let  U  =  exp(eF) 
where  F  is  skew-Hermitian  (which  has  purely  imaginary  diagonal  terms). 
When  0  <  <,  <C  1  then  U  ^  I  -h  tF.  Ignoring  higher  order  terms,  the  main 
diagonal  elements  of  U  have  the  form  Ua  =  1  +  joi  for  a,  €  R.  Also,  f/.y  =  /,y 
and  Uji  =  —  fij.  The  equation  becomes 

^fij9ii,u  ~~  ^fij9ii,jj  ~  ^fij9ij,ji  ~  0 


which  implies 


and 


9ii,ii  ~  9ii,jj  T  9ij,ji  —  by  62 


9ij,kl  —  b\bij8kl  +  bibnSjk 


□ 


0.6  Some  Special  Definitions 
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Definition  83  A  function  obeying  the  rule  f{SZ)  —  S‘^f{Z)  for  all  8  is  called 
homogeneous  of  degree  a.  This  is  a  complexification  of  a  definition  given  in 
class  by  Krantz. 

The  next  two  definitions  should  be  blamed  on  me. 

Definition  84  A  generalized  even  function  /  is  a  function  f{Z)  that  obeys 
the  rule  f{e'^Z)  =  /(Z)  for  all  ^  6  R,  where  Z  €  C". 

Definition  85  A  generalized  odd  function  /  is  a  function  f{Z)  that  obeys  the 
rule  f{e'^Z)  =  e'^f{Z)  for  all  0  €  R,  where  Z  €  C”. 

Notice  that  when  Z  is  restricted  to  R  that  0  6  {utt  |  jt  €  Z},  and  these 
definitions  specialize  to  the  usual  notions  of  /(— x)  =  /(x)  for  even  functions 
and  /(— x)  =  — /(x)  for  odd  functions.  Thus,  odd  functions  are  homogeneous 
of  degree  1  in  e'®. 

Definition  86  Two  functions  f,g  are  ca//ed  algebraically  independent  if  for 
any  polynomial  function  ^Uijf'g^  =  0  with  complex  coefficients  Oij,  we  must 
have  Oij  =  0  for  all  i,j.  This  definition  was  taken  from  Lang  (p.  262)  [160]. 

0.7  Generalized  Nested  Operator 

Definition  87  Nested  Operator.  Let  □  and  o  be  operators  such  that 

a  0  (6nc)  =  (a  o  f>)D(o,  o  c) 
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Then  the  nested  operator  is  defined  by 

n 

A 

k=l 

def  [aiDfti  o  {a20b2  o  {a3D63  o  (•  •  •  a„_iD6„_i  o  (a„n6„))})] 

=  aiD(6i  o  o  62  o  0  62  o  63  o  a4)D  •  ■  • 

□(61  o  62  o  •  •  •  o  6„  o  (i„)D(6i  o  •  •  •  o  6„) 

where  n  G  N.  This  is  an  extension  of  the  definition  given  by  Tuma  (section 
8.11)  [268]. 

Application:  Polynomial.  Let  □  be  ordinary  addition,  let  o  be  ordinary 
multiplication,  and  let  bk  =  x  for  all  k.  Then 

n 

A  (a/t  +  a:)  =  oi  +  02^  +  031^  H - h  +  x" 

*=1 


Appendix  P 
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INTEGRALS 

The  purpose  of  this  portion  is  to  make  this  thesis  easier  and  quicker  to  read 
and  understand,  and  for  verifying  those  integrals  which  have  been  required 
or  closely  related  to  the  thesis  work.  Many  of  these  integrals  can  be  done  by 
most  sophomores.  However,  some  may  require  explanation.  In  many  cases 
these  integrals  were  not  in  Gradshteyn  and  Ryzhik  [94]  or  in  Abramowitz 
and  Stegun  [1].  The  integrals  are  ordered  according  to  their  use  of  prior 
results.  They  fall  into  several  categories.  The  most  interesting  category  has 
to  do  with  integration  over  groups,  and  those  which  involve  zonal  polynomials 
and  hypergeometric  functions.  The  next  most  interesting  grouping  consists  of 
integrals  over  matrices.  Finally,  there  are  the  routine  tedious  integrals  which 
are  uninteresting  and  should  only  be  done  once  in  a  lifetime,  and  hence  they 
are  recorded  so  they  will  not  have  to  be  done  again. 

The  integrals  which  are  most  important  are  the  ones  that  define  the  mul¬ 


tivariate  Gamma  function,  the  matrix  Laplace  transform,  and  those  involving 
hypergeometric  functions  of  one  and  two  matrix  arguments. 
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P.l  Easy  Chain  Rule  Bookkeeping  Method 

These  are  the  uninteresting  integrals.  I  will  try  to  make  it  more  palatable  by 
introducing  a  bookkeeping  method  to  reduce  the  work  involved  in  doing  the 
chain  rule. 

Lemma  59  Chain  Rule  Evaluation,  f  udv  =  uv  —  f  vdu. 

Sometimes  evaluating  an  integral  by  using  the  chain  rule  yields  a  long 
sequence  of  steps.  To  reduce  the  labor  (and  thus  the  opportunity  for  clerical 
error),  a  simple  convention  below  permits  efficient  iteration.  Given  /  udw,  write 
the  uv  term  on  the  left  half  of  a  line.  Follow  the  uv  term  by  a  vertical  dashed 
line,  which  is  then  followed  on  the  right  half  of  that  line  by  the  —  f(du)v  term 
left  in  its  integral  form.  This  integral  is  now  operated  on  by  the  chain  rule  with 
the  result  on  the  next  line.  The  process  continues  repeatedly.  The  solution  to 
the  original  integral  is  the  sum  of  terms  to  the  left  of  the  dotted  line. 

An  example  of  this  technique  is  given  in  lemma  63,  the  evaluation  of 

J  {xt  + 
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/u(”*)y(”)  i 

=  ll(m)jj(n-l)  :  _  J„(m+l)y(n-l) 

/ 

_y(m+l)^(n-2)  :  ^  J  ^^(m+2)y(n-2) 

/ 

_(_ jj(»n+2)y(n— 3)  .  — J"  y(”*+3)y(’l~3) 

/'  i 

_y(m+3)y(n-4)  : 

sum  is  solution 

P.2  Mundane  Integrals 

P.2.1  Exponential  Integral  Definition 

Lemma  60  Exponential  Integral.  This  integral  is  listed  here  for  reference  pur¬ 
poses.  This  integral  is  used  to  express  the  results  of  one  of  the  test  distributions 
for  examining  a  disjoint  combination  of  sample  eigenvalues. 

/oo 

t~^e~^dt,  z  >  0 


This  is  found  in  Abramowitz  and  Stegun  (p.  288)  [1].  The  path  of  integration 
excludes  t  =  0  and  the  path  does  not  cross  the  negative  real  axis.  A  series 
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expansion  is 


~  X" 


Ei{x)  =  7  +  ln(x  )  +  E£n.*>o 


n=l 


nni 


where  7  =  0.57721  56649  is  Euler’s  constant.  Ei{x)  is  tabled  in  Abramowitz 
and  Stegun  [1]  for  ^  <  x  <  2. 

P.2.2  Integrals  of  Rational  Functions 


Proposition  89 


J  t{xt  +  y)  ^dt  =  +  y)-y  ln(xf  +  y)] 


Proof. 


J  t{xt  +  y)~^dt  =  -  j  t  ^[^(x^  +  y)] 

=  ~t  ln(x<  +  y)~  ~  J  ~  ^  y)d{xt  +  y) 

=  -t  ln(x<  +  J/)  -  \{xt  +  y)[ln(xi  +  j/)  -  1] 


X  X'* 

'xt  xt  +  y' 


+ 


xt  +  y 


1 


=  +  y)-y  +  y)] 


□ 


Proposition  90 


I _J1 _ jt  =  -T  + 


(xt  +  yy 
for  n  <  m. 
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Proof. 


J  eixt  +  y)-^dt 

{m-l)x  '  {m-l)x  J 

.--L—^(xt  + 

(m  — l)x  (m  — l)(m  — 2)x2 

_ ~  _ -t’^~^(xt  +  - 

(m-l)(m-2)(m-3)x3  ^ 

n{n-l)---(n-k  +  2)  1^+1,  .  ,  y\-{m-k) 

(m  —  l)(m  —  2)  •  •  •  (m  —  A:)  X* 

(m  —  l)(m  —  2)  •  •  •  (m  —  fc)  X*  V 
~  h  (m-l)l  (n  +  l-jt)!x*  ^ 


n 


Proposition  91 


J{-^)  *  =  ^(l-ta(^‘+!')) 

Proof.  In  proposition  90,  let  n  =  m  and  consider  the  last  two  terms  when 
k  =  m  ~  1.  We  get 

m(m-l)---[m-(m-l)  +  2]  1  ^  ._(,n-(m-l)] 

(m  —  l)(m  —  2)  •  •  •  [m  —  (m  —  1)]  x"*"^ 

m(m  —  1)  •  •  •  [m  +  1  —  (^  ~  1)]  1  /  4- 

(m  —  l)(m  —  2)  ■  •  •  [m  —  (m  —  1)]  x”*“^  J 

= _ _ 

(m  —  l)(m  —  2)  •  •  •  (1)  x”*"* 
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m(m  —  1)  •  •  •  (2)  1 


(m  —  l)(m  —  2)  •  •  •  (1) 


Recall  proposition  89.  We  get 

_ -il-  :.ii) _ 

m(m  —  1)  •  •  •  (2)  1  ..  ,  ,  ,  .  M 

+  (m-l)(n.-2).'~(l)  ?;r?7  I(^‘  +  S')  -  S'  ‘"(xi  +  J)] 

m{m  — l)---(3)  1  9,  mt  my  ,,  ,  ,  ,, 

=  -(n,-  l)(m-2y.  .-(T)?^'  +  *'>■  +  5^  +  I'  -  +  S'" 

Also  note: 

(n  —  1  —  ky.  n!  n 

(n  —  1)!  (n  +  1  —  A:)!  {n  —  k  +  l)(n  —  k) 

Substitution  into  the  last  two  terms  of  proposition  90  produces  the  result.  □ 


Lemma  61 


ax  +  b 


<<1  =  E  (-')*- 


f  b\  X”*  *  f^\  1  /  ,  L\ 

_j  - _  ln(ax  +  6) 

a  \a I  m  —  k  a  \aj 


Proof.  Solve  by  brute  force  algebraic  division  with  a  remainder  term,  and 
then  integrate  the  result. 


=la:— 1-1  (1]  X— 2  + 
ax +  6  a  a  \a  J 


The  result  of  the  A:'*  division  (starting  with  Ar  =  0)  produces  the  term 


(-1)*'  (i)  (- 

\aj  \a 


.m— fc— 1 


with  a  remainder  for  that  division  being 
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Division  number  k  ~  m  —  \  produces  a  remainder  term  with  a  zero  power  of 
X  and  remainder  (—1)'"  .  Thus  the  result  of  the  division  is 

\a  J  ax  b 


X" 


ax  b 


k=o  «  \  “ 


+  (- 


Integrate  this  over  all  x  to  get  the  final  result.  □ 


Theorem  144  Let  k  and  p  be  positive  integers.  Then 


/ 


(ax  +  b)P 


dx 


E  +  k<p-l 

Tn=o 

] ,  k>p-l 

m=0  I 

V  m^^-p+1  ) 


Proof.  Let  z  =  ax  +  6.  Then  x  =  i(z  —  6)  and  dx  =  ^dz.  Performing  the 
change  of  variables,  we  get 


/ 


X''  .  1  /•  ri 


(ax  +  6)p 
fc+i 


dx 


1  /•  ri  1 

=  -  /  -(z-b)  z-^dz 
a  J  \.a 


=0 


where  the  complicated  exponent  on  z  was  chosen  for  the  expansion  to  keep 
the  dependence  on  k  explicit.  When  k  >  p  —  \  we  have  a  special  case  when 
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m  =  k  —  p  +  1.  We  rearrange  the  problem  to  sharpen  this  dependence.  The 
rearranged  problem  is 


{ax  +  b)p 


The  result  follows  from  the  integration.  □ 


P.2.3  Integrals  Related  to  the  Gamma  Function 

Definition  88  The  g'^ma  _  action  is  defined  by 

fOO 

r(2)=  /  t^-^e-^dt 

Jo 

where  Re(z)  >  0  and  z  is  complex. 

Recall  that  zV{z)  =  r(2  +  l)  and  when  n  is  an  integer  we  get  F  (n  +  1)  =  n!. 
Lemma  62  Let  Re(Qz  +  1)  >  0.  Then 

roo 

/  t^^e-^^dt  =  +  1) 

Jo 

Proof.  Perform  the  change  of  variables  x  =  fit.  Then  t  =  f-  and  dt  =  ^dx. 
The  new  limits  of  integration  are  (0,  oo).  Then 

e^e-^Ut  =  j  x^^e-^dx 

fOO 

Jq 


when  Re(Q2  +  1)  >  0.  □ 
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Theorem  145  Let  n  be  a  positive  integer,  and  let  a  and  b  be  real  numbers. 


Then 


t  re-*dt  =  n\f' 

Ja  tok'- 

*2n!exp  y - —j  ^  ,  -  smh[a;(A;)] 


Proof.  Apply  the  chain  rule. 


f  <"e  *dt  =  — +  n  /  t"  *e  ‘ 
Ja  Ja 


=  — (t"  +  nt"~'  +  n(n  —  l)t”  ^  +  •  •  •  +  n!)e" 


fc=0  a 


The  last  term  in  brackets  looks  inviting  because  of  its  symmetry.  It  can  be  a 
false  oasis.  If  you  go  through  the  mathematics  and  let 

ij}{k)  =  ^  6  —  a  +  Zcln  j  =  ^  l(^  ~  In  6)  —  (a  —  fc  In  a)] 


then  you  can  manipulate  the  last  term  in  brackets. 


I bJ  ''’’-bJ 


=  exp  (-^)  {ab)tn  {exp  [5^  +  ^  In  (f)] 
-exp  [-(fa!  +  4|n(e))]} 


877 


=  2iexp  ^ - {ab)^l^  sinh[w 


m 


Then  the  result  becomes 


=  n!y"(e-“a*  -  e-‘’b'^)—, 

fro 

=  i2nlexp  J2  sinh[u;(^)] 


□ 


Corollary  44 


Proof.  In  theorem  145,  let  a  =  0.  Notice  that  the  second  form  of  the  result 
is  not  as  useful  because  ln(0)  is  undefined  in  the  definition  of  u>(k).  □ 

Corollary  45 

”  a* 


roo  «  fjK 

/  re-^dt  =  n!e-“  Y"  ^ 
to 


Proof.  In  theorem  145,  let  6  =  oo.  □ 


Corollary  46 


fOO 

/  re-‘dt  =  n!  =  r(n  +  l) 

Jo 


Remark.  This  merely  follows  the  definition  of  r(n  +  1),  but  we  obtained  it 
by  letting  a  =  0  and  b=  oo  m  theorem  145.  □ 
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Corollary  47 


Proof.  Let  t  =  cw  he  &  change  of  variables.  Then  w  =  ^  d,w  =  \dt. 
The  limits  become  (0,  ac). 


Corollary  48 


fOO  .  . 

/  e~°‘^x'^dx  — 

Jo 


Proof.  Let  t  =  ai  be  a  change  of  variables.  Then 


*dt 


r  e-°^x^dx=  r  = 

Jo  Jo  \Q/  a  Jo 

=  a-^rn+l) 

Jo 

where  we  note  that  Re(m  +  l)  =  m  +  l>0  for  m  >  0.  By  the  definition  of 
the  gamma  function,  we  get 


o-("‘+i)r(m  +  1)  =  a 


□ 


Lemma  63 


r  "  /  1  n' 

hxt  +  yre—‘dt  =  - 


n— m  ^—zxt 


:J  {n  —  m)! 


{xt  +  y)"-"’e 


879 


Proof.  Apply  the  chain  rule. 

^  +  (^)  ” 

=  +(i)^”(«-l)(«-2) 

X  f{xt  +  yY~^e~^^^dt 

-{ji)'n{n-\){n-2){xt  +  yY-^e-^^^  \  \ 

\  \  +(^ynlfe-^^*dt 

-  n!€-^^‘  : 

To  get  the  answer,  we  sum  the  results  in  the  left  column.  We  get 


/(xi-j-yY^  ^^*dt 


r  "  /  1  \”*+^  nl 


n—m^—ixt 


□ 


Proposition  92 


/x”£-“rfx=f:(-i)‘  (-i 
J  iS  ''  0 


n\ 


iy+^ _ 

i)  (n  —  A:)! 


.n-k^-ax 


e-“^n  >  0 
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Proof. 

/  x'^e~°'^dx 

-  (-i) 

:  +  (— l)  n{Ti  —  1)  /  x'^~'^e~^^dx 

:  _(_l)"n(n-l)(n-2)/x"-3e 


(~o) 


=  i:(-l)‘ (-;)*'"' for">0 

k=zO  '  '  '  ' 

□ 


P.2.4  Ratio  of  Exponential  to  Algebraic  Term 

Proposition  93 

J  xt  +  y  X 

Proof.  Perform  a  change  of  variables.  Let  u  =  z{xt  +  y).  Then 


and 


xz 


J (xt  +  =  J  du  =  J  u'^e-'^du 

=  -e^y  Ei(-t0  =  -e^y  Ei[-2(xt  +  y)] 

X  X 


Then 
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The  proof  is  more  easily  seen  to  be  consistent  with  other  references  regard¬ 
ing  the  exponential  integral  Ei(  — u)  by  considering  the  definite  integral 


fb 

I  {xt  +  y)~^  exp{—zxt)dt 
Ja 


The  change  of  variables  yields 


fz(xh+y)  z  I  1  rH^b+y) 

/  /  u-^e-'^du 

J  z(xa4‘v\  ti  XZ  X  J  2{xa+y) 


foo  roo 

I  w“*e~“du  —  /  u“'e~“du 

X  Jz{xa+y)  Jz{xb+y) 


=  [-  Ei[-2(xa  -f-  y)]  +  Ei[-z{xb  +  t/)]] 

X 


=  -e*"  [Ei[-2(x6+  y)]  -  E\[-z{xa  +  y)]] 

X 


Proposition  94  Let  n  >  0.  Then 


J{xt  +  y)  "e- 


L  m=  1  ) 

+  (-!)”"'  (^)”  Ei[-2(x<  -I-  y)] 
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Proof.  Apply  the  chain  rule. 

f{xt  +  \ 


=  -  (i)  (;rh) 

+  (i)  ((n-lHn-2))  ^ 


=  -  (i)  (;ir)  + 

.  +  (i)  ((n-lHu-2))  ^ 

x{zxy  J  e“*^‘(xt  +  y)~^^~'^^dt 


(^)  ((n-l)(n-2)(n-3))  ^ 


x{zx)^e~^^^{xt  + 


+(-1)"-'  (J)’'''(^x 

x(^x)”~^e“*®‘(x<  +  y)“' 


+(-!)"-'  (Jr'(ri7)i('*r'x 

X  /  e“**‘(xt  +  y)~^dt 


+(-!)“-■  a)-' o^x 
x(2:x)”-’  (^)  e-*s'  Ei[-2:(xi  +  i/)] 

Add  the  contents  of  the  left  column  to  obtain  the  result.  □ 


1 


P.2.5  Product  of  Rational  Term  and  Exponential 
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Theorem  146 


r‘^dx 


r  ar" 

J  ax  -j-  b 

=  (-!)"-(-)  e'>^f'^\n(ax  +  b) 
a  \a J 


oo  n+m  — 1 

+  E  E  (-i)”-"* 

m—O  k=0 


H-)‘ 

a  \a / 


m\{n  +  m  —  k) 


Proof. 


I  =  I  -^e-<-dx  =  /  f; 

J  ax  +  0  J  ax  +  0 


^  y  (-^)" 

m!  J  aa 


n  oo 

m=0 

n+m 


{—cxY 


dx 


ml 


m=0 


ax  +  b 


dx 


Apply  lemma  61. 


^  (-<=)"'  r'^'T  V-n*i  ('>Y 

J  ^  m=0  L  fe=0  ^  ^  ('*+”»-'=) 

4.(_l)n+Tni  (i)"'*'’"  ln(ax  +  6)] 

The  factor  in  the  first  term  prevents  us  from  getting  an  exponential 

extracted.  However,  look  at  the  second  term. 


=  (;)  “p  (?) 

The  final  result  is  the  sum  of  the  terms.  Exchanging  the  order  of  summation 
with  corresponding  adjustment  of  limits  did  not  simplify  the  result.  □ 
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Theorem  147 


I 


X 


{ax  +  by 


i  ‘^dx 


^p+m  fc  1  {p  +  m _ k _ 1 — ^  /  cy  ^  M I 

{p  +  m-k-\)\  \a)  j 

Proof.  Perform  the  change  of  variables  z  =  ax  +  b.  Then  x  =  ^(z  —  b)  and 
dx  =  ^dz.  The  integral  becomes 


y  j^-(2  —  6)j  *’^-d2  =  e‘‘  J  e  z’'’{z  —  b^dz 

=  f  e-^ z-P  E  (-6)'"2*-"*d2 

ni=0 


885 


Concentrate  on  this  integral. 


-  (-r-hrr) 

\p+m-k-l  J 


+  (p+m-fc-l)  (a) 


((p+m-fc-l)(p+m-/:-2))  ^ 


.  ((p+*n-fc-l)(p+m-fc-2)) 

^  {zY  I 


((p+fn-A;-l)-”(p+m-fc-n) )  ^  .  ((p+m-fc-l)—(p+m-A:-n) ) 

X  e-f  z;j-(p+m-fc-n)  X  (f)"  / 


p  +  m  —  k  —  I 


((p+m-fc-l)!)  ^ 


;  ((p+m-fc-l);)  (a) 

X  /  e~a^z~^dz 


p+m-fc-l 


Concentrate  on  the  integral  /  e“a^z  ^dz.  Perform  the  change  of  variables  w  = 
-2.  Thus  2  =  -w  and  dz  =  -dw.  The  integral  is 

f  e~^^z~^dz  =  f  e~'"~w~^-dw  =  f  e~'“w~^dw 
J  J  a  c  J 


Consider  the  definite  integral  /"e  '“w  ^dw.  It  equals 


■Uw  -  r 

J  V 


€  '“w  ^dw 


=  _Ei(-u)  +  Ei(-u)=  Ei(-u;)|“ 
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Thus 


J  e  ‘■*2  =  Ei 


Substitute  into  the  expansion  of 


/ 


to  get 


/ 


p+m-k-l 


Y'p+m— fc-1  (p+tn-fc— 1— n)!  /c\”  *  „-~z  -  —  {p+m—k—n) 
2^n=l  (p+m—k—l)i  \a/ 


The  original  integral  is  found  by  substituting  z  =  ax  -h  b  into 


(1) ^ 


Ep+m— fc— 1  (p+m— fc— 1— n)!  f  c\ 
(p+/n— 1)!  \a/ 


n— 1 


n,} 


□ 


P.2.6  Generalized  Even  and  Odd  Functions 


Proposition  95  Let  f  {z)  be  a  generalized  even  function  for  complex  z.  Then 


Proof. 


J  f{z)dz  =  27r  J  f{r)rdr 


J  mdz  =  Jn  re'^)  r  dr  d$ 
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by  changing  variables  to  polar  coordinates.  This  equals  /  f{r)rdrd6  since 
f{re'^)  =  f{r)  for  all  6.  Therefore 

J  f{z)dz  =  27r  y  /(r)  r  dr 

□ 

Proposition  96  Let  f(z)  be  a  generalized  even  functional  for  z  G  C".  Then 

J  f{z)dz  =  (27r)"y^  f{r)rdr 
where  r  €  R"  and  r  =  rir2  •  •  ■  J'n- 

Proof. 

J  f{z)dz  =  J  /(rie‘^ ,  •  •  • ,  r„e‘®")  n  •  •  •  r„dri  •  •  •  dr^dOi  •  •  • 

Note  that  each  2,  is  undergoing  a  change  of  variables  to  (r,,  0^)  rather  than 
the  usual  change  to  pure  polar  coordinates  where  there  is  a  single  true  radial 
component.  We  can  do  this  since  each  of  the  Zi  are  functionally  independent. 
Since  f{re'^)  =  /(r)  for  all  0,  we  get 

J  /(n,  •  •  • ,  r„)  ri  •  •  •  r„dri  •  ■  •  dr^dOi  ■■■dOr, 

Since  /  is  a  generalized  even  function,  this  integral  equals 

i2iry  J  J  f{L)rdr 

□ 
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Proposition  97  Let  f{z)  be  a  generalized  odd  function  for  complex  z.  Then 


f  f{z)dz  =  0. 


Proof. 


J  f{z)dz  =  J  f{re'^)rdrd6 


by  changing  variables  to  polar  coordinates  where  — t  <  0  <  tt. 


j  f{re'^)rdrd6  =  J  e'^  f{r)rdrd0  =  e'^dO^ 


Since  /  is  a  generalized  odd  function,  /  e'^dO  =  0.  □ 


Proposition  98  Let  f  (^)  be  a  generalized  odd  function  for  z  €  C”.  Then 
S  f{z)dz  =  0. 


Proof. 


J  f{z)dz  =  J  /(rie‘®‘ ,  •  •  • ,  r„e'®")ri  ■  •  •  r„dri  •  •  •  dr^dO^  •  •  •  d0„ 

=  (^J  e‘^'  d0i^  ■  "  (/  ■  ■  ■  •>  ■  ■  ■  ^"‘^''1  "-(irn 

Note  that  /  e*^'‘d0k  =  0  for  each  k  E  [l,n].  D 


P.2.7  Exponentials 

Lemma  64 


J  e  =  TT 
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Proof. 


J  e  =  J  e 


□ 


Lemma  65 

/+<»  ,  1 

^2(o-t)+ig-<  _  7:r(a  —  i  +  1),  Re(a  —  i -|-  1)  >  0 

-oo  2 

Proof.  Perform  the  change  of  variables  y  =  which  implies  dy  =  2tdt. 
Then 

/+00  1  fOO 

•OO  2  JO 

=  yy]^-h-‘W  =  \fy‘-‘e-Uy 

From  the  definition  of  the  gamma  function,  we  know  this  is 

i  r  j/(»-+i)-ie-«dj,  =  ir(a  -  i  +  1),  Re(a  -  i  +  1)  >  0 
2  Jo  2 


□ 


Proposition  99 


/  e  “  dt  =  an 

Jc 


Proof. 
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Let  y  =  a  which  implies  dy  =  a  ^1‘^dt.  Then 

e~^^Rdy^  J 

=  a>l'^^a>l‘^s/i  —  aw 

□ 


Proposition  100 

/  +  00  1 

<2“+ie-'’'  '  =  -6“+»r(a  +  l),Re(a  +  1)  >  0 

-oo  Z 

Proof.  Let  u  —  which  implies  du  =  2b~^tdt  and  =  bu.  Then 

J  —  (X>  2  J  —  oo 

1  r°°  1  r°° 

=  -b  (6«)“e-“du  =  -6“+*  /  ti“e-“du 
2  Jo  2  Jo 

We  use  the  definition  of  the  gamma  function  to  lead  us  to 

J—<x>  2  ^0 

=  ^6“+*r(a  +  1),  Re(a+  1)  >  0 

tu 

□ 


Proposition  101 
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Proof.  Let  I  =  e  where  a  e  C  and  x  €  R..  Then 

/2  =  e-“^'(/x)  jj’  e-“(^'+^')dxdy 

Let  X  =  r  cos  9  and  y  =  r  sin  6.  Then  x^  -\-  and  dxdy  =  r  dr  dd.  This 

leads  to 


Therefore 


roo  /•ir/2  , 

P  =  I  I  e”"’'  rdrdO 
Jo  Jo 


=  -  / 

2  Jo 

TT  2  TT 

=  -—  /  e"""  (-2ar)dr  =  -— e 

4a  Jo  4a 

=  lim  (e""'’  -  l)  =  ^ 
b-foo  4a  '■  ■'  4a 


e  °'^^rdr 


This  particular  change  of  variables  of  solving  for  the  square  of  the  integral 
is  one  I  have  seen  applied  only  to  this  particular  example. 


Proposition  102 

r^e-"^'dx= 

J  — oo  V  Of 

This  is  a  modification  of  integral  3.321.3  of  Gradshteyn  and  Ryshik  [94]. 

Proof.  Consider  the  following  integral.  I  =  fj^^  e“"**dx  for  a  €  C  and 
X  €  R.  Then 
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Perform  the  change  of  variables  x  =  r  cos  0  and  j/  =  r  sin  0.  Then  1^  +  1/^  =  r* 
and  dxdy  =  rdrd0.  This  implies 


B  [Ttl'l 


=v  / 


rdrd0 


-(i)r 


e  rdr 


Note  that 


=  e'"’’'(-2ar)dr 


Then 


TT  r° 
a  Jo 


e-"^'(-~2arWr  = 

a 


=  lim  — — e  "’'^1  =  lim  —  (e  —  l) 

0  6-*oo  O:  '  ' 


6-»oo  a 

Since  a  is  complex,  let  a  =  /3  +  *7.  Then 


lim 

b-*oo 


=  lim 


b^oo  ' 


<  lim 

6— *00 


,-/362 


=  lim 


1  =0 


Therefore 


This  implies 


for  a  ^  0.  O 


a  a 


I  =  e--“"Vx  =  ^ 

J-00  V  a 


Proposition  103  Let 


a\  =  a(a  —  l)(a  —  2)(a  —  3)  •  •  • 


r 


where  a  is  not  necessarily  an  integer.  Then 


/„"■(!  --jMu  =  -ur'+‘ 


Note  that 


-u)* 


is  the  probability  of  k  failures  in  m  trials  when  0  <  u  <  1,  where  u  is  the 
probability  of  success  on  one  trial. 


Proof. 


/u“(l  -uYdu 


X /u'"(  — l)(a  +  1)(1  —  = 


index  summing  column 


expansion  column 


{0}  Ml 


{1}  - 


(a+l)(a+2) 


_  y)a+2 


I  m(m-l) 
^(a+])(o+2)^ 


X  /u’"  ^(1  —  u)“+^dw 


.(a+l)(a+2)( 


n _ ) 

)(o+3); 


,  /  m(m-l)(m-2)  \ 

^  V(a+l)(<7+2)(a+3)y  ^ 

X  / 
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{k] 


_  ( _ s! _ rsL-'l  X 

X  -  u)“+‘+*') 


_l_  (  — sJ - ml — ^  X 

X  /  ( 1  -  ii)“+'+*<iu 


{">  -'}  ;  +5rg)i /(I  - 

The  sum  of  terms  on  the  left  side  of  the  table  is 


/ u'"(l  —  uydu  =  —  52  7 - r — TTTT — ~  “) 

J  ^a{a  +  l+  k)\  (m  -  A:)! 


a+l-f /: 


(a-hl)!k! 


m\ 


(a  +  1 )  (a  +  1  +  ^')!  -  f^V- 


=  V  -  -  li)* 

(«  +  l) 


P.3  Integrals  with  Hypergeometric  Function 
of  Matrix  Arguments  and  Zonal  Polyno¬ 
mials 


Definition  89  Let  z  be  a  complex  number,  and  let 


[a]jt  def  a{a  +  1)  •  •  •  {a  +  A’  —  1) 
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be  Pockhammer's  symbol.  Then  the  generalized  hypergeometric  function  (or 
series)  of  scalar  argument  z  is 


p^q(Ol,  •  •  •  ,  flpj  •  •  •  ,  feqj  2)  def  ^  ^ 

~  fc=0 


[a\]k  ■  •  •  [ap]t  z'^ 

k\ 


This  is  Muirhead’s  definition  1.3.1  [187].  This  definition  is  an  important  build¬ 
ing  block  for  the  material  dealing  with  zonal  polynomials. 


Notes.  The  sets  of  numbers  {at}?  and  {fej}’  are  complex  numbers.  The 
{bj}\  cannot  be  zero  or  a  negative  integer.  If  any  of  the  {a^}?  is  zero  or  a 
negative  integer,  the  series  is  finite.  If  p  <  g  and  \z\  <  00,  the  series  converges. 
If  p  =  g  +  1,  the  series  converges  if  jz]  <  1  and  diverges  if  [zj  >  1.  If  p  >  9  +  1, 
the  series  diverges  if  2  ^  0. 


Definition  90  Let  S  be  the  set  of  all  n  x  n  nonsingular  Hermitian  matrices, 
S  =  {X  =  ,  X  nonsingular}.  Let  X  G  S.  Let  p  and  q  be  nonnegative 

integers.  Let  Qi,  •  •  ■ ,  Qp,  0\,  -  ■  •  ,0q  be  complex  numbers  such  that  — /?j  +  (fc  —  1) 
is  not  a  nonnegative  integer  for  1  <  j  <  q  and  I  <  k  <  n.  Let 

[a]fc  =  a{a  +  1 )  •  •  •  (a  -f  A-  —  1 ) 

which  is  Pockhammer’s  symbol.  Let  Zm{X)  be  the  zonal  polynomial  of  sig¬ 
nature  m.  Define  the  hypergeometric  function  of  single  matrix  argument  X 
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by 


pFql^Oti  >  *  •  •  >  Olp,  01,  ‘  f  0q,  X) 


oo 


=  E  E 

d=0  \m\=d 


•'•  Qp]m  ZmjX) 

d\ 


There  are  some  important  special  cases  worth  mentioning. 
oFo{X) 


=  etr(X)  =  E  E  ^  =  E  ^ 

d—O  |m|=(j  d=0 


iFo(a;X)  =[det(/-X)]-« 

Note  that  pFq{X)  =  pFg(A^)  where  is  diagonal  matrix  of  eigenvalues  of 
X.  This  is  Gross  and  Richard’s  definition  6.1  [96].  It  is  very  important  in  the 
work  on  zonal  polynomials  of  matrix  argument. 

Definition  91  Let  S  be  the  set  of  all  n  xn  nonsingular  Hermitian  matrices, 
S  =  {X  =  X^,  X  nonsingular}.  Let  X,Y  €  5.  Let  p  and  q  be  nonnegative 
integers.  Let  a^,- ■  •  ,ap,  0i,- ■  ■  ,0q  be  complex  numbers  such  that  —0j  +  (A:  —  1 ) 
is  not  a  nonnegative  integer  for  I  <  j  <  q  and  I  <  k  <  n.  Let 

[a]jt  =  a{a  +  1)  •  •  •  (a  +  —  1) 

be  Pockhammer’s  symbol.  Let  Zm{X)  be  the  zonal  polynomial  of  signature  m. 
Define  the  hypergeometric  function  of  two  matrix  arguments  (X,  F)  by 

pFq{0!i,  '  '  '  ,  Olp',  0\,  '  '  ‘  ,  0q,  X,  F) 
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This  is  a  complexification  of  Muirhead’s  definition  7.3.2  [187],  and  it  is  Gross 


and  Richard’s  equation  4-2  [97]. 

Theorem  148  Let  X,Y  S  S  where  S  is  the  set  of  all  nonsingular  n  x  n 
Hermitian  matrices.  Let  U{n)  be  the  set  of  all  n  x  n  unitary  matrices.  Let 
{dU)  be  the  normalized  Haar  measure  on  U(n).  Then 

fu(n)  , ap; 6i, •  •  • , fe,; XU»YU){dU) 


i  *  "  ‘  bl,  ‘  ‘  ‘  ,  bg,  X,  Y^ 

This  is  a  complexification  and  slight  modification  of  Muirhead’s  theorem  7.3.3 
[187],  and  it  is  Gross  and  Richards’  equation  4-3  [97]. 

Proof. 

/U(n)  pFgia^, . . . ,  6i,  ■ .  • ,  bg;  XU»YU){dU) 


V  V  ZmiXyffYU)  (jrrx 


=  loiX.  fefcfetei  Z..,iXU’'YU)(dU) 

Applying  Gross  and  Richards’  proposition  5.5  [96],  we  get 


^  T-  [ailm-jgplm  1  ZMX)Z„(Y) 

ko  \k=d  d!  S.zUln) 


pFg(^Oil  ,  •  •  •  ,  Otp,  fi\  ,  •  ■  * ,  fig  ,  X,  Y') 

□ 
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Proposition  104  .  Let  ^  G  C.  Then 


^(2”)  fH 

i)r[i(n-i)l  Jo 


If  „2  cos  0 


sin0]”  ^dO 


This  is  a  complexification  of  Muirhead’s  lemma  1.3.2  [187]. 


Proof.  This  is  Muirhead’s  [187]  proof  with  some  steps  filled  in.  By  defini¬ 
tion  of  the  exponential  function, 


00  I 

g2cose  =  (^2  cos  0)'‘ 

k-O 

This  converges  for  all  |z|  <  00.  Then  by  Fubini  [230],  since  the  series  converges, 
we  can  interchange  the  sum  and  integral.  Thus 


00  zk  fT, 

HtT  /  [cos0]*[sin0]"-2d0 

fc=0 


Observe  that  [sin  0]"“^  is  an  even  function  about  ^  on  the  interval  (0,  tt).  When 
k  is  odd,  then  [cos0]*  is  an  odd  function  about  |  on  the  interval  (0,  tt).  When 
k  is  even,  then  [cos^]*'  is  even.  Thus 


2/(]^^^[cos0]*[sin0]'*  ^dO,  fc  even 
0,  k  odd 

Thus,  if  we  let  2m  =  k  for  even  k,  then 

/  e""°*®[sin5]"-2d0=  53  7^2  /  [cos0]2’"[sin0]"-2d0 

Jo  (zmj!  Jo 


f  [cos0]*[sin0]"  ^dO  = 
Jo 
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Perform  a  change  of  variables.  Let  x  ~  sin  ^6.  Then 

cos  ^6  =  \  —  X 


and 


dx  =  2  sin  0  cos  6  dO 


which  implies 

do  =  —  x)~^^^dx 

The  limits  are  changed  from  0  €  (0,  |)  to  x  €  (0, 1).  Then 

2  /(f'^^[cos0]^’"[sin0]”~^d0 

=  2/o‘(l  -  x)’"x(”-2)/2ix-l/2(l  _  3.)-1/2^3. 

Merely  switching  notation  from  m  back  to  k  (but  not  doing  a  change  of  vari¬ 
ables),  this  integral  is 

f\l  -  xf-h^^-^^/^dx 

Jo 

We  rewrite  the  exponents  to  place  this  integral  into  the  form  of  the  definition 
for  the  beta  function,  as  given  in  Abramowitz  and  Stegun  equation  6.2.1  [1]. 


/J(l  -ar)(‘'+^)-‘x(^)-Mx 


which  is  a  commonly  used  identity  for  the  beta  function.  The  beta  function  is 
an  important  function  in  theoretical  statistics. 
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With  these  changes,  we  now  have 


E 


*:=0 


ri 

^  n— 1 

1  2  , 

)ri 

(2Ar)! 

r( 

i  +  A:] 

1 

Notes. 


r  (A:  +  f )  (A:  +  f  -  l)  (A:  +  f  -  2)  •  •  • 

“f(!)  (i-i)  (t-2)- 


Also  note  that 


r 

1  _  1 

i-a) 

'>'(0 

r( 

;i)  {2k)< 

(2A:)(2A 

l)(2ib-2)-- 

•2-  1  • 

ri 

(a^)  (3^3)  (2^) 

l...l 

fil 

ir( 

i) 

(2A:)(2A; 

1 

1 

•••2 

•  1 

•  r 

a) 

^(2A;-l)(2A:-3)(2A:-5)---l-r(^)  _  1 

~  2H2A:  -  1)(2A:  -  3)(2A:  -  5)  •  •  •  1  •  P  (|)  A:!  ~ 

Putting  everything  together, 

n  ^  00 

_ ^  ^  i  /-  ■  :  (91n-2Jfl  _  y-  2k _ L 


_  S  (  i  _  I?  ( n. 

Notice  in  the  argument  of  pFg{ai,  •  •  •  ,ap;  6i,  •  •  • ,  6,;  y)  that  since  p  =  0  then 
there  are  no  parameters  {ci,  •  •  •  ,ap}.  We  have  only  6i  =  ^  and  y  =  We 
write  oF\{bi',  y)  since  we  have  no  {a^}.  So,  the  lemma  is  proven. 

For  additional  connections  with  the  beta  function,  see  Herz  (p.  480,  bot¬ 
tom)  [106].  □ 
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Theorem  149  Let  ai,bj,c,z  E  C.  Then 

/o°°  •  •  • ,  flp;  6i,  •  •  • ,  6,;  kt)dt 


=  V{c)z  ^p+iFg{ai,---,ap,c;bi,---,bg;kz 

which  holds  for 

{p  <  q,  Re(c)  >  0,  Re(2)  >  0} 

and 

{p  =  9,  Re(c)  >  0,  Re(2)  >  Re(A;)} 
This  is  Muirhtad’s  lemma  1.3.3  [187],  stated  without  proof. 


Proof.  The  proof  is  straight  forward  with  the  following  observations  which 
come  from  the  definition  of  the  gamma  function. 

TOO  TOO 

/  =  2-("*+^)r(m  +  c)  (P.l) 

Jq  v  0 


r(m  +  c)  =  (m  +  c  —  l)r(m  +  c  —  1)  = 


(P.2) 


=  (m  +  c  -  l)(m  +  c  -  2)  •  •  •  (c  +  l)cr(c)  =  (c)mr(c) 


Now,  substitute  the  definition  of  pF,  into  our  problem 


/o°°  e  ^*F-‘pF,(ai,  •  •  •  ,ap;  6i,  •  •  • ,  6,;  kt)dt 


<X> 

JO  „^0  [/^l  m  m  m!V  I 
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Switching  the  order  of  summing  and  integration  is  allowed  since  the  sum  con¬ 
verges.  This  gives  us 


[fll]m  '  ‘  ■  [^p]m 
m=0  [A]m  •  •  •  [^q\m  ^1 


=  E 


[ai]r 


[“-I"  +  c) 


,^0  [A]m  •  •  •  Wq]m  ml 
where  we  used  our  first  observation,  equation  P.1,  which  leads  us  to 


r(c)2- 


E 


m=0 


\Pl]rn‘'\f3q]rn  ^l! 


=  r(c)2  %+iFg{ai,---,ap,c-bi,---,bg]kz 


□ 


P.4  Integrals  with  Complex  Multivariate  Gamma 
Function 

Definition  92  Complex  Multivariate  Gamma  Function,  Crm(a). 

m 

cr„,(a)  def 

i=l 

=  7r'"^'’‘'i>/'-'r(a)r{a  -  1)  •  •  •  r(a  -  m  -f-  1) 

m 

=  7r"‘(m-‘)/2f[r(a-m.  +  0 

t=i 

where  Re(a  —  m  +  1)  >  0.  This  is  a  complexification  of  Muirhead’s  theorem 
2.1.12  [187].  It  is  James  equation  83  [120].  This  is  fp(a)  in  Patil  (p.  7)  [205]. 
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Discussion.  The  gamma  function  is  defined  by 

roo 

r(2)  =  /  Re(2)  >  0 

Jo 

where  2  is  complex.  Recall  the  properties  of  the  univariate  r(2). 

r(n  +  1)  =  n! 

r(2  +  1)  =  zV{z) 


Definition  93  The  real  multivariate  gamma  function  is  defined  to  be 

m  1 

r„w = n  -  5(*  -  >)) 

i=i  ^ 

where  Re[a  —  |(m  —  1)]  >0.  This  is  Muirhead  theorem  2.1.12. 

This  function  appears  in  the  denominator  of  the  real  Wishart  distribution 
density  function,  VFp(n,  S)  where  m  =  p  and  a  =  f.  r„,(a)  shows  up  in 
integrals  that  involve  zonal  polynomials.  It  also  shows  up  in  the  cumulative 
distribution  function  and  expected  moments  of  the  Type  I  Multivariate  Beta 
distribution. 

Crm(a)  appears  in  the  denominator  of  the  complex  Wishart  distribution 
CWp(n,  S)  density  function  where  m  =  p  and  a  =  n. 
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Lemma  66 

,  ^  .  "I 

=  n  r(a  -  i  +  1)  =  cr„(a) 

i=l 

This  is  Muirhead’s  definition  2.1.10  [187]  and  James’  equation  (83)  [120]. 
Herz  [106]  identifies  CTplm)  as  the  generalization  to  matrix  variables  of  the 
Eulerian  integral  of  the  second  kind. 

Proof.  This  proof  draws  from  Srivastava’s  derivation  for  the  standard  com¬ 
plex  Wishart  distribution  ClPp(n,  I).  We  begin  with  Srivastava’s  main  result, 
his  equation  4  (p.  314)  [256], 

P{B)  =  CxT  \det  Br-^f{B) 

where  B^  =  B  >  0,  and  Ci  is  a  constant.  Since  P{B)  is  a  density,  it  integrates 
to  1. 

1=/  P{B){dB)=  f  \det  Br-”  f{B){dB) 

JBa=B>0  Jb"=B>0 

Choose  f{B)  =  as  we  did  in  Srivastava’s  derivation  of  the  den¬ 

sity  function  for  the  complex  Wishart  distribution  (see  theorem  67).  In  that 
derivation,  Ci  was  evaluated  to  be 

^  _ 

;rP(P-*)/2  []  r(m  —  7  +  1) 

1=1 
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Substituting  into  our  integral,  we  obtain 


1  = 


2P^pm2-P;r"’"P 


Ib»=i 


trB 


7rP(p-i)/2  n  r(m  -  i  +  1)  Jb»=b>0 
1=1 


IdetBp-P  {dB) 


Dividing  both  sides  by  the  constant  in  front  of  the  integral,  we  get 

f  e- |det  (dB)  =  IT  r(m  -  i +  1 )  =  CVpim) 

Jb»=b>o 


□ 


Theorem  150  Let  E  =  and  A  =  be  mxm  complex  matrices  where  S 
and  A  are  positive  definite.  Then  the  matrix  Laplace  transform  of  (det  AY~'^ 
with  respect  to  S  is 

!a>o  etr(-S-M)(det  A)^-'^(dA) 

=  (det  S)“Cr„,  (a)  =  ££-,  { (det  } 

This  is  a  complexification  of  Muirhead  theorem  2.1.11.  This  is  also  Herz  equa¬ 
tion  (1.1)  [106]. 

Proof.  This  is  a  complexification  of  Muirhead’s  proof.  By  theorem  119,  we 
can  decompose  E  into  E  =  BB^ .  By  convention  we  use  the  symbol  Ez  for  B 
and  call  it  the  square  root  of  S.  Thus  for  S  =  EzE^  and  E"^  =  E~^S“2. 
Also,  recall  theorem  38  says  that  the  Jacobian  J{A  V)  of  the  transformation 
A  =  E2  VE^  where  V  =  is  given  by 

|detE*p  =  (detE)’" 
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With  these  preliminaries,  perform  the  change  of  variables  in  our  integral. 


X4>oetr(-E-M)(det.4)“-(cf^) 


= /v>oetr  (-E-»E5KEf)  (det  [Eil/E"j)“  (det  E)”*  (dF) 

=  /  etr  (-E^E-'EaV)  (detE)“-'"(detV)“-'"(detEr(dV) 
Jv>o  ^  ' 

=  /  etr(-V^)(detl/)“-’”(dV)(detE)“ 

Jv>o 


From  lemma  66  we  recognize  the  integral  as  the  complex  multivariate  gamma 
function,  giving  us 


(detE)“Cr,n(fl) 


which  completes  the  proof.  □ 

Note.  If  you  normalize  the  integral  by  (det  E)“Cr,n(o),  then  the  integrand 
is  a  density  function 

(det  A)®"”*  etr(— E“M) 

(detE)“Cr,n(o) 

This  is  the  density  function  for  the  complex  Wishart  distribution  CW,„(a,  E). 

□ 


Proposition  105 


/  [det(£;"F;)]“-'’e-‘'' 

JCPXP 


7rP"Crp(a) 

crp(p) 


where  E  €  Mp(C)  and  E  is  unstructured.  This  lemma  was  motivated  by  the 
proof  of  theorem  7. 


Proof. 
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We  recognize 

as  a  probability  density  function  for  the  complex  matrix  normal  distribution 
CA^p,p(0,  /,  I).  Then  the  integral  is  the  expected  value  of 

[det{E^E)Y-^ 

If  £  ~  CA^p,p(0, /, /),  then  G  =  E^ E  has  the  complex  Wishart  distribution 
CWp(p, /).  By  theorem  79,  we  know 

^{[de.(G)|-)  = 

Thus 

;cp.,(<iel(£''S)l-»etr|-£;«£J(rf£) 


=  ,r»Y{ldet(G)|“--)  =  >r"‘§M^ 


Appendix  Q 
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NOTATION 

Q.l  Names  of  Variables 

Sometimes  the  choice  between  Latin  or  Greek  letters  for  a  special  variable 
is  governed  by  broad  consensus  within  a  scientific  community  and  this  will 
override  otherwise  stated  conventions.  Such  exceptions  are  noted  in  the  next 
section. 

Matrix:  single  upper  case  Latin  or  Greek  letters  such  as 

y4,Z,r,0,  A,E,  S, 

Vector:  usually  single  lower  case  Latin  or  sometimes  Greek  letters  such  as 

Scalar:  usually  lower  case  Greek  or  sometimes  Latin  letters  such  as 

Q,  0, 7,  (7,  a,  6,  c,  d 

Deterministic  variables:  usually  chosen  from  early  in  the  alphabet. 
Random  variables:  usually  chosen  from  late  in  the  alphabet. 

Distribution  parameters:  usually  Greek  letters. 


Q.2  Special  Notation 
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Special  meanings  are  attached  to  the  following  symbols.  This  reservation  is 
sometimes  violated  due  to  a  paucity  of  available  symbols. 

£  always  is  the  expected  value  operator. 

i  =  v/-l  when  i  is  not  an  index.  The  letter  j  will  not  be  used  for 
\/^.  I  decided  to  use  i  since  i  is  not  current. 

C  identifies  a  distribution  or  function  as  the  complex  variables  version  of 
the  appended  symbol.  By  itself,  or  with  a  superscript,  it  refers  to  the  set  of 
complex  numbers  or  the  product  space  of  complex  numbers. 

is  the  Hermitian  transpose  of  the  matrix  (or  vector)  A.  The  Hermitian 
transpose  is  the  transpose  of  the  complex  conjugate  of  the  matrix  (or  vector). 
Note  that  when  applied  to  a  scalar,  this  is  merely  the  complex  conjugate. 

A^  is  the  transpose  of  the  matrix  (or  vector)  A. 

is  reserved  for  the  scalar  variance  parameter  of  a  distribution. 

8  is  used  as  a  noncentrality  parameter  for  the  complex  Wishart  distribution, 
which  is  a  matrix. 

S  is  reserved  for  the  matrix  covariance  parameter  of  a  distribution.  For 
the  complex  matrix  normal  distribution,  this  is  the  covariance  between  col¬ 
umn  vectors.  Some  authors  call  this  the  variance-covariance  matrix  or  the 
dispersion  matrix. 

E  is  reserved  for  the  row  covariance  matrix  parameter  for  the  complex 


910 


matrix  normal  distribution. 

fi  is  reserved  for  the  mean  value  parameter  of  a  distribution. 

A  always  refers  to  a  singular  value, 
always  refers  to  an  eigenvalue. 

A  always  refers  to  the  rectangular  matrix  having  the  non-zero  singular 
values  on  its  main  diagonal. 

always  refers  to  the  square  matrix  having  the  non-zero  eigenvalues  on 
its  main  diagonal. 

I  almost  always  refers  to  a  sample  singular  value.  Sometimes  I  is  an  integer 
index. 

P  always  refers  to  a  sample  eigenvalue. 

L  almost  always  refers  to  the  rectangular  matrix  having  the  non-zero  sam¬ 
ple  singular  values  on  its  main  diagonal. 

always  refers  to  the  square  matrix  having  the  non-zero  sample  eigenval¬ 
ues  on  its  main  diagonal. 

W  almost  always  refers  to  a  Wishart  matrix.  Other  matrices  may  also  be 
Wishart  matrices. 

4>  is  often  reserved  for  use  as  a  characteristic  function  of  a  distribution. 

A(i4)  is  a  diagonal  matrix  consisting  of  the  elements  on  the  diagonal  of 
matrix  A. 

{xi}”  means  the  sequence  Xi,X2, •  •  •  ,x„. 
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operator. 

{dZ)  is  a  differential  form  of  the  elements  of  Z,  which  possibly  can  be  a 
matrix.  This  notation  is  used  in  connection  with  probability  density  functions 
and  is  a  shorthand  notation  for  the  absolute  value  of  the  product  of  the  element 
differentials,  such  as 

dz\idzi2  •  •  ■  dzipdz2\  ■  •  •  dz2p  •  •  •  dz^p 

Equivalently,  this  is 

I  dzu /\dzi2  f\- ■  ■  f\dzxp /\dz2i  f\‘ •  •  f\dz2p ■  ^dz^p  I 
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Q.3  Selected  Abbreviations 


IEEE 

IRE 

JASA 

JASA 

MLE 

RKHS 

SIAM 


UMVU 


Institute  of  Electrical  and  Electronics  Engineers 
Institute  of  Radio  Engineers,  which  later  became 
IEEE  (PGIT  Vol.l,  February  1953) 

Journal  of  the  Acoustical  Society  of  America  (Vol.l,  October  1929) 
Journal  of  the  American  Statistical  Association  (Vol.l,  1888/1889) 
Maximum  Likelihood  Estimate 
Reproducing  Kernel  Hilbert  Space 
Society  of  Industrial  and  Applied  Mathematics 
Note  :  Journal  of  the  5/AA/ later  named  SIAM  Journal  on 
Applied  Mathematics 

Uniformly  Minimum  Variance  Unbiased  estimate 


Q.4  Reference  Author  Names 


The  name  of  an  author  of  a  reference  used  in  direct  support  of  this  research 
is  printed  with  this  type  style  in  the  bibliography  to  distinguish  that  reference 
from  those  used  only  for  presentation  of  background  and  history.  Prior  to  now, 
there  was  no  canonical  way  of  efficiently  making  this  discrimination. 


Q.5  Taxonomy  of  Logic 
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An  attempt  has  been  made  to  conform  to  Solow’s  taxonomy  of  statements 
of  formal  logic  [252]  (p.  37).  His  classifications  are  defined  below.  This 
historical  taxonomy  is  not  uniformly  implemented.  Exceptions  were  made 
where  I  judged  a  proposition  in  the  context  of  other  similar  propositions.  Any 
hierarchical  taxonomy  will  fail  because  we  have  a  multidimensional  lattice  of 
logic. 

1.  Proposition:  A  statement  of  interest  that  you  are  trying  to  prove. 

2.  Theorem:  (Subjectively)  extremely  important  propositions. 

3.  Lemma:  Proposition  used  as  a  step  in  proving  a  theorems. 

4.  Corollary:  Proposition  whose  veracity  follows  immediately  from  a  theo¬ 
rem. 


5.  Axiom:  Statement  accepted  without  proof. 


VITA 


Curtis  Irvin  Caldwell  was  born  on  04  March  1947  in  Columbus,  Ohio, 
United  States  of  America.  His  father.  Col.  Elmer  Irvin  Caldwell,  wa^  a 
career  U.S.  Army  soldier  who  served  during  World  War  II  in  North  Africa, 
Italy,  and  France,  and  also  in  wars  in  Korea,  and  Viet  Nam.  As  an  Army 
dependent,  Curtis  lived  for  over  a  year  in  Japan,  and  in  Germany  for  over 
three  years  beginning  shortly  after  the  Hungarian  Revolution,  and  during  the 
Czechoslovakian  Uprising  and  the  Second  Berlin  Crisis.  It  was  during  these 
years  that  he  developed  a  deep  sense  of  appreciation  for  the  value  of  freedom 
that  not  everyone  in  the  world  enjoyed.  From  his  mother.  May  Alice  Wing 
Caldwell  of  Worthington,  Ohio,  Curtis  learned  that  possession  of  knowledge 
and  power  incurs  the  obligation  of  its  stewardship  for  the  benefit  of  others. 
From  his  brother,  Harold  Earnest  Caldwell,  Curtis  learned  to  love  inquiry  and 
analytical  thought. 

Curtis  Caldwell  attended  grammar  school  in  Germany,  and  Francis  C. 
Hammond  High  School  in  Alexandria,  Virginia,  USA.  He  completed  a  B.S. 
in  Computer  Science  at  the  University  of  South  Carolina  in  1972  under  Dr. 
William  Hines  Linder,  and  an  M.A.  in  Mathematical  Sciences  with  a  dual 
concentration  in  Statistics  and  Computer  Science  from  University  of  North 
Florida  under  Drs.  William  J.  Wilson  and  Yap  Siong  Chua.  It  was  there  that 
Curtis  developed  a  love  for  seeing  other  people  learn. 

In  addition  to  his  interests  in  underwater  acoustics  and  signal  processing, 
Curtis  has  interests  in  Christian  systematic  theology  and  citizenship.  He 
counts  it  a  privilege  to  serve  a  nation  of  free  people  under  God. 

He  is  married  to  Susan  Marion  Belcher  Caldwell,  the  daughter  of  Annie 
Lou  Belcher  and  Jack  Belcher,  a  liberator  of  the  Nardheim  Concentration 
Camp  of  World  War  11.  His  son  is  Joshua  Benjamin  Lee  Caldwell,  named  for 
the  spy  sent  in  to  the  Promised  Land  who  reported  that  God’s  promise  is 
good,  the  favorite  son,  and  the  patriot-gentleman-soldier  Robert  E.  Lee. 
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)•)  will  always  refer  to  the  magnitude.  For  z  £  C  and  z  =  le'®  for  x,6  E  R, 
then  \z\  =  X.  j-l  will  not  be  used  for  determinant. 

det  A  will  always  refer  to  the  determinant  of  a  matrix, 
tr  A  is  the  trace  of  a  matrix,  which  is  the  sum  of  all  the  elements  on  the 
major  diagonal. 

etr  A  =  exp(tr(A)) 

exp  A  is  the  exponential  function,  which  usually  refers  to  the  scalar  function 
e^.  It  has  also  been  defined  for  a  matrix  argument  A,  in  which  case  is  a 
matrix. 

0  is  the  zero  matrix.  When  a  matrix  of  all  zero  entries  multiplies  another 
matrix,  the  result  is  still  a  matrix  of  all  zero  entries,  with  appropriate  di¬ 
mensions.  Rather  than  using  a  different  notation  for  each  null  matrix,  I  have 
simply  used  0  where  the  dimensions  are  assumed  to  be  correct.  Thus,  1  have 
also  dispensed  with  the  need  for  0^  as  the  transpose  of  the  null  matrix. 

diag(i4)  is  an  ordered  n— tuple  of  elements  of  the  main  diagonal  of  the 
n  X  n  matrix  A.  The  context  may  determine  if  this  n— tuple  is  a  row  vector 
or  a  column  vector  of  n  elements. 

diag(6i ,  •  ■  - ,  b„)  is  an  n  x  n  diagonal  matrix  with  (^i,  •  •  ■ ,  bn)  as  the  elements 
on  the  diagonal. 

/\  is  (1)  usually  reserved  for  the  exterior  product  operator  (wedge  product) 
used  with  differential  forms  and  (2)  sometimes  reserved  for  the  nested  sum 


